KQL From the Ground Up

End-to-end introduction to Kusto Query Language for SOC analysts new to writing queries.

1. How to read this guide

If you have watched a few KQL videos, copied some queries from a wiki, and changed a couple of column names without really being sure what you were doing, this guide is for you. The goal here is not to make you fast at KQL. The goal is to give you a mental picture of what is actually happening when you type a query, so that when something does not work the way you expected, you have somewhere to start looking.

Each section builds on the previous one. We start with what a query is as a piece of text, then move to filtering, then to picking columns, then to grouping, and so on. By the time you get to joins, you should already understand that a join is not a magical operator but just one more step in a sequence of table transformations. The exercises at the end are intentionally easy - the point is to wire the new ideas into your hands, not to test you.

Throughout the guide you will see four kinds of highlighted boxes. The blue ones are general points worth pausing on. The amber ones are warnings about things that look right but are wrong. The red ones are outright traps. The purple ones, marked From the field, are stories from real query reviews where something went sideways - those are some of the most valuable bits, because they show you what the warnings actually look like when they happen to you.

From the field This guide draws specifically on queries I have worked through - including the long "Mega Query" where I created a double-materialize bug, the Sysmon EventData parsing patterns I developed, and the schema variations that have tripped people (and myself) up.

2. The mental model: pipes and tables

The single most important thing to internalize about KQL is this: every query is a sequence of steps, and every step takes a table as input and produces a table as output. There is no concept of looping, no concept of variables in the imperative sense, and almost no concept of "fetching one record." You start with a table, you pass it through a series of transformations, and what comes out the other end is the result you see.

The pipe character - the vertical bar | - is what connects one step to the next. You can read it out loud as the word "then" or "and then". That single habit will change how queries feel to you. When you look at a query, do not try to understand the whole thing at once. Read it left to right, line by line, saying "then" every time you hit a pipe.

Here is the smallest meaningful query you can write:

SecurityEvent
| take 10

Read it out loud: "Take the SecurityEvent table, then give me 10 rows."

That is the whole shape of every KQL query you will ever write. The first line names a table. Every subsequent line that begins with a pipe is one more transformation step. You can have one step or thirty steps; it is all the same pattern.

A query in KQL is sometimes called a "pipeline" for exactly this reason. Each step is fed by the step before it and feeds the step after it. The output of one step is always a table - possibly with different columns, fewer rows, or aggregated values, but always a table. This is why you can stack steps freely: every step speaks the same language to its neighbors.

Worth pausing on Comments in KQL start with two slashes, just like in many other languages: // this is a comment. There is no multi-line comment syntax. Use comments liberally, for your own sake, and for the sake of others who may be reading/using your query.

3. Your first query

Let us go back to the take 10 query and look at what comes back. The SecurityEvent table holds Windows Security events forwarded from your domain-joined machines, things like logons, logon failures, process creation, service installs, and so on. When you run SecurityEvent | take 10, you get back ten rows from that table. Which ten? That is actually an important question.

The answer is: any ten. Microsoft's documentation is explicit that take does not guarantee any particular ordering - it gives you a sample of rows the engine happened to encounter first. This is fine when you are exploring schema (you just want to see what columns exist and what they look like), but it is a bad idea any time you actually care which rows you get. There is a different operator, top, that orders the data first and then gives you the top however-many; I will get to that one later.

For now, run SecurityEvent | take 10 in your environment and look at the result. You will probably notice that SecurityEvent has dozens of columns. Most of them are empty for any given row. This is because SecurityEvent is a wide table that has to accommodate every possible Windows Security event, but each individual event only fills out a handful of those columns - a logon event fills in LogonType and IpAddress, while a process-creation event fills in NewProcessName and CommandLine, and so on. This is normal. Almost every Sentinel table is built this way.

Before you do anything else with a new table, get used to running TableName | take 10 against it just to see what columns exist and different value types exist. Then run TableName | getschema, which gives you a tidy list of column names and their data types. The two together are the cheapest possible way to orient yourself in a new dataset.

SecurityEvent
| getschema
| project ColumnName, ColumnType
| order by ColumnName asc

Notice the structure of that query. It has three steps. The first step is the table. The second step asks for the schema, which produces a small table where each row describes one column of SecurityEvent. The third step keeps only two columns of that schema-description table and sorts them. The output is a clean alphabetical list of every column with its type. Read that out loud: "Take SecurityEvent, then describe its schema, then keep only the column name and type, then sort alphabetically." That is the whole query.

4. Time is the most important column

Almost every Sentinel table has a column called TimeGenerated. It is a datetime value that records when the event was generated. This is the column you will filter on more than any other, because the data behind these tables is enormous - often billions of rows - and the engine indexes data by time. If you do not constrain the time range, the engine has to consider far more data than it needs to, your query takes longer to run, and you may even hit a query limit (10m by default in Log Analytics).

The function you will use most for time filtering is ago(). It takes a duration and returns a datetime that is that far in the past, relative to right now. So ago(1h) means "one hour ago," ago(7d) means "seven days ago," and so on. The syntax for the duration uses suffixes: s for seconds, m for minutes, h for hours, d for days. There is no week or month suffix; you write seven days as 7d and a month as roughly 30d.

The conventional first step in any production query is to filter TimeGenerated to the smallest range that still answers your question:

SecurityEvent
| where TimeGenerated > ago(1h)
| take 10

"Take SecurityEvent, then keep only rows from the last hour, then give me ten of them."
Read the query literally, you are asking the engine to return any value in the TimeGenerated column that is greater than the time was 1h ago.

Time-filter first, before doing anything else. The reason is that the engine pushes time filters down to the storage layer - it never even reads data from outside your time window. By contrast, if you filter on a different column first and then filter on time, the engine has to do work it could have skipped. This will not matter on a tiny query in a tiny lab; it will matter a lot the first time you write a query against a busy production workspace.

Trap You can write where TimeGenerated > ago(1h) or where TimeGenerated >= ago(1h). They are nearly identical for almost every realistic case. What you should NOT do is write where TimeGenerated > "2026-04-25". That is comparing a datetime to a string and you will usually get either zero rows back or a confusing error. Always wrap explicit timestamps in either:
datetime(): where TimeGenerated > datetime(2026-04-25)
or
todatetime(): where TimeGenerated > todatetime(2026-04-25T15:22:00Z)

If you need a fixed window - say, between two specific timestamps - use between:

SecurityEvent
| where TimeGenerated between (datetime(2026-04-20) .. datetime(2026-04-21))

The two dots between the start and end times are the range syntax in KQL. You will see it again with in (a .. b) for numeric ranges.

5. Filtering with `where`

The where operator is the workhorse of KQL. You will use it more than any other operator. Its job is to keep only the rows that satisfy a condition; rows that do not satisfy the condition are dropped. The remaining rows are passed forward to the next step.

The condition can be almost anything that evaluates to true or false: a comparison, a string match, a check against a list of values, the result of a function, or any combination of those joined with and and or. Here is a typical example with several conditions chained together:

SecurityEvent
| where TimeGenerated > ago(24h)
| where EventID == 4625
| where Account !endswith "$"

That query keeps only events from the last 24 hours, only event ID 4625 (Windows failed-logon), and only accounts that do not end with a dollar sign. The dollar sign at the end of an account name is the convention for machine accounts (computer objects in Active Directory), so this query is asking for human-account failed logons.

You might have noticed that there are three separate where clauses instead of one big one. Both are valid, and the engine is smart enough to combine them internally. Many people prefer one filter per line, like above, because it makes the query easier to read and easier to comment out individual conditions while you are debugging. Either of these forms works:

// Style A: one where per line
SecurityEvent
| where TimeGenerated > ago(24h)
| where EventID == 4625

// Style B: combined with "and"
SecurityEvent
| where TimeGenerated > ago(24h) and EventID == 4625

Pick whichever you find easier to read. The performance difference is negligible; the readability difference is real.

Worth pausing on Understand Style A is implicity utilizing and; the pipe is telling the KQL do this and then do this. Whenever using Style B to one-line your logic, if you use or instead of and, you may filter a lot less than intended.

Comparison operators you will use constantly

For numbers, dates, and exact-match strings, use the standard comparison operators: == for equals, != for not-equals, <, >, <=, >=. Note that equality is a double equals sign - a single equals sign is the assignment operator and you will see it inside extend and project-rename later.

To check whether a value is in a list, use in:

SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID in (4624, 4625, 4634, 4648)

That is the same as writing EventID == 4624 or EventID == 4625 or EventID == 4634 or EventID == 4648, only shorter and easier to maintain. You can negate it with !in. There is also a case-insensitive variant in~ that I will return to in the next section, when we talk about strings.

Worth pausing on When you are checking against a long list, you can also write the list once at the top of the query as a named value and reuse it. This is what let is for. I will get to let in detail later, but here is a sneak peek so the idea sits in the back of your mind:

let AuthEvents = dynamic([4624, 4625, 4634, 4648]);
SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID in (AuthEvents)

The dynamic([...]) wrapper turns a comma-separated list into an array value that in knows how to use. You will see dynamic a lot when we get to columns that hold structured data.

6. String comparisons in depth

Most of the columns in security data are strings: account names, file paths, command lines, URLs, IP addresses (yes, often as strings), and so on. KQL gives you a lot of string operators, and the differences between them matter for both correctness and performance. This is also one of the most common places where new KQL users write something that "works" on their dev data and falls over later.

Let us walk through them in order from cheapest to most expensive. The mental picture you should keep is that some operators can use the engine's index, which means they look at very little data, while others have to scan every row, which means they read everything.

`==` and `=~` for exact match

The double-equals operator == is case-sensitive: "Administrator" == "administrator" is false. The squiggle-equals operator =~ is case-insensitive: "Administrator" =~ "administrator" is true. Microsoft's official guidance, taken straight from their best-practices documentation, is to use =~ rather than wrapping a column in tolower() before comparing. The reason is that tolower(Account) == "administrator" applies a function to the column, which prevents the engine from using its index - every row has to be lowercased before comparison. Using =~ tells the engine "do a case-insensitive comparison" and lets it stay efficient.

// Microsoft recommends this:
| where Account =~ "corp\\bryce"

// Not this - slower because the function blocks index use:
| where tolower(Account) == "corp\\bryce"

Also note the doubled backslash in the string. KQL string literals use the backslash as an escape character, so a single backslash needs to be written as two backslashes. If you want to avoid this and write the path more naturally, use a "raw" string by prefixing with the at-sign: @"corp\bryce".

`has` versus `contains`

This is the most important distinction in KQL string operators. They look almost identical and people use them interchangeably, but they behave very differently and one of them is much faster than the other.

The has operator looks for a whole-word match, which means it works against the engine's word index and is fast. The contains operator looks for the substring anywhere - even in the middle of another word - which means it cannot use the index and has to scan every row. For columns like CommandLine, which can be megabytes of text per row across millions of rows, the difference between has and contains is the difference between a query that finishes in seconds and one that times out.

The catch is that has only matches whole "terms," where a term is roughly an alphanumeric run separated by punctuation or whitespace. So CommandLine has "powershell" will match "powershell.exe -enc ABCD" because powershell is a separate term inside that string. But CommandLine has "shell" will not match "powershell.exe", because shell is glued to power with no separator. If you needed to find "shell" anywhere, including embedded in other words, that is when you would reach for contains.

// Fast: uses the term index
| where CommandLine has "mimikatz"

// Also fast: case-sensitive variant
| where CommandLine has_cs "Mimikatz"

// Slow: full scan, only use when you really need substring match
| where CommandLine contains "shel"

The default is case-insensitive. If you want case-sensitive, append _cs to almost any string operator: has_cs, contains_cs, startswith_cs, and so on.

Watch out Microsoft's guidance is explicit: when you are extracting from a dynamic (JSON) column, do the cheap has filter first, and only then dig into the JSON. Their example is "where DynamicColumn has \"Rare value\" | where DynamicColumn.SomeKey == \"Rare value\"" rather than "where DynamicColumn.SomeKey == \"Rare value\"." The first form filters out most rows cheaply with has and only does JSON parsing on the remainder; the second form parses every row's JSON.

Multi-term searches: `has_any` and `has_all`

When you want to match any of a list of terms, do not write a long chain of or conditions. Use has_any:

let Suspicious = dynamic(["mimikatz", "rubeus", "impacket", "bloodhound"]);
SecurityEvent
| where TimeGenerated > ago(1d)
| where CommandLine has_any (Suspicious)

The has_all variant requires that every term in the list be present. You will use has_any far more often than has_all. Both still use the term index, so they are fast.

Prefix and suffix: `startswith` and `endswith`

Self-explanatory: startswith matches a prefix and endswith matches a suffix. Both are reasonably efficient. The classic use case for endswith is filtering out machine accounts, which always end in $: where Account !endswith "$". For startswith, a typical use is filtering by a domain prefix: where Account startswith "CORP\\".

Regular expressions: `matches regex`

This is the operator of last resort. It supports proper regular expressions and will match anything you can describe in regex syntax, but it cannot use any index - it has to evaluate the regex against every row. Reach for it only when no other operator will do.

// Look for a 32-character hex string anywhere in the URL
| where RemoteUrl matches regex @"\b[a-f0-9]{32}\b"

Notice the @ prefix on the string. That is the raw-string syntax - it tells KQL to treat the backslashes as literal backslashes rather than escape sequences. You almost always want raw strings for regex patterns; otherwise you end up writing "\\b[a-f0-9]\\{32\\}\\b" and going slowly mad.

The `not` function

Certain operators - such as has_any to name one - do not allow you to prepend a ! (e.g. !has_any is not valid KQL). In those instances, you can use the not() operator to reverse the logic through a boolean expression. This is telling the query, in more literal terms, return data "where X is not true".

// Where the following statement is not true (or false)
| where not(Account has_any ('$','NT AUTHORITY','DWM'))

You should use the not() function when you need to negate complex logical expressions, multiple grouped conditions, or scalar functions that do not have a dedicated negative operator built into KQL.

Summary table you can refer back to

Operator	What it does	Speed	Typical use
`==`	Exact equality, case-sensitive	Fast (indexed)	EventID, numeric IDs, exact known strings
`=~`	Exact equality, case-insensitive	Fast	Account names, file names where casing varies
`has`	Whole-word substring	Fast (indexed)	Searching CommandLine, RemoteUrl, etc.
`has_any` / `has_all`	Multiple terms	Fast	Watch lists of suspicious tokens
`startswith` / `endswith`	Prefix / suffix match	Fast-ish	Domain prefixes, file extensions
`contains`	Any substring, anywhere	Slow (scan)	Last resort when `has` won't match
`matches regex`	Full regex pattern	Slowest	When you absolutely need a pattern
`in` / `!in`	Membership in a list	Fast	EventID lists, allow/deny lists
`not()`	Reverses the boolean result	Operator dependent	Complex logical expressions or when there is not a dedicated negative operator

7. Choosing columns with `project`

Tables in Sentinel can have a hundred columns, most of which are empty for any given event. Once you have filtered to the rows you care about, the next thing you usually want to do is narrow down to the columns you care about. That is what project is for. It selects a subset of columns and discards the rest. As a side effect, it also reorders the columns to whatever order you list them in.

SecurityEvent
| where TimeGenerated > ago(1h) and EventID == 4624
| project TimeGenerated, Computer, Account, IpAddress, LogonType

"Take SecurityEvent, then keep only rows from the last hour where EventID is 4624, then keep only those five columns."

Why bother projecting? Two reasons. First, your output is much easier to read when you are not staring at 90 empty columns. Second, projecting early makes downstream operations faster - every step from this point on only has to carry five columns through, not ninety. Microsoft's official guidance is to project as early as possible in the pipeline.

The project family

There are several variants of project that handle slightly different cases. The basic one we just saw selects columns. The others are about renaming, dropping, and reordering without losing the rest. Here is the whole family in one example:

DeviceProcessEvents
| where TimeGenerated > ago(1h)
| project-keep TimeGenerated, DeviceName, AccountName, FileName, ProcessCommandLine, SHA256
// project-keep does the same as project but is clearer when you are
// keeping a subset of columns rather than reorganizing them
| project-rename Host = DeviceName, User = AccountName, Cmd = ProcessCommandLine
// project-rename changes column names without dropping the others
| project-away SHA256
// project-away removes specific columns and keeps the rest

You will mostly use plain project early on. As your queries grow, project-rename becomes valuable because it lets you give columns more meaningful names without rewriting half the query. project-away is useful late in a pipeline when you want to drop a few "scratch" columns you used for intermediate calculations.

One subtle thing about projecting

Once a column has been dropped by a project, it is gone for the rest of the pipeline. If you then write where DroppedColumn == "x" you will get an error saying the column does not exist. This sounds obvious but trips people up because the original table still has that column - it is just no longer in the row-set being passed to the next step. Remember that every step's output is a complete table on its own; you only see what you projected.

8. Computing new columns with `extend`

Where project picks columns that already exist, extend creates new columns by computing values from existing ones. This is where you start doing real work - extracting the file name from a full path, flagging events that match some condition, or computing time differences.

SecurityEvent
| where TimeGenerated > ago(1h) and EventID == 4688
| extend HourOfDay = datetime_part("hour", TimeGenerated)
| extend IsAfterHours = HourOfDay < 7 or HourOfDay > 19

That query adds two new columns: HourOfDay, which extracts the hour from the timestamp, and IsAfterHours, which is a boolean true/false based on whether the hour falls outside business hours. Now any later steps in the pipeline can filter or summarize using these new columns. You can stack as many extend clauses as you need, and you can compute multiple columns in a single extend by separating them with commas:

extend
    HourOfDay    = datetime_part("hour", TimeGenerated),
    IsAfterHours = datetime_part("hour", TimeGenerated) !between (7 .. 19),
    LeafFileName = tostring(split(NewProcessName, "\\")[-1])

That third column is worth pausing on because it shows a pattern you will use often. The NewProcessName column on a 4688 event holds a full Windows path like C:\Windows\System32\cmd.exe. The split function breaks that path on the backslash separator, returning an array. Indexing with [-1] gives you the last element, which is the file name. Wrapping the whole thing in tostring converts it from a dynamic value to a plain string. The result is just cmd.exe, which is much easier to filter and summarize on than the full path.

A common stumble You cannot use a column you just extended in the same extend statement. In the above example, if you attempt to use HourOfDay on the next line, it would return an error saying the column does not exist. To use an extended column in another extend statement, you have to re-extend first.



                    | extend HourOfDay = datetime_part("hour", TimeGenerated) 

                    | extend IsAfterHours = HourOfDay !between (7 .. 19)

Conditional values with `iff` and `case`

Two functions you will see constantly inside extend are iff and case. The iff function is a simple if/then/else: iff(condition, valueIfTrue, valueIfFalse). You can also use iif, there is no functional difference, it is purely an alias. The case function is a multi-branch version: case(cond1, val1, cond2, val2, ..., defaultVal).

| extend Severity = case(
    EventID == 4625, "medium",
    EventID == 4624 and LogonType == 10, "high",
    EventID in (4672, 4720, 4732), "high",
    "low"
)

Each pair is a condition followed by the value to return when that condition is true. The conditions are evaluated top to bottom and the first match wins. The last argument, with no condition before it, is the default - what gets returned if none of the conditions matched.

9. Counting things with `summarize`

Up to this point, every operator we have seen returns a row-by-row transformation of the input. The output has the same general shape as the input, just with fewer rows or different columns. summarize is the first operator that fundamentally changes the shape of the data. It groups rows together and produces one output row per group, replacing the original rows with a summary value.

The simplest use of summarize is just counting all the rows:

SecurityEvent
| where TimeGenerated > ago(1h) and EventID == 4625
| summarize count()

That returns exactly one row with one column - the total number of failed logons in the last hour. Useful, but boring. The interesting form is summarize ... by ..., which says "give me one row per distinct value of these columns":

SecurityEvent
| where TimeGenerated > ago(1h) and EventID == 4625
| summarize Failures = count() by Account, Computer

"Take SecurityEvent, then keep only failed logons in the last hour, then count how many there were per Account and Computer combination."

The output has three columns: Account, Computer, and Failures. There is one row per unique (Account, Computer) pair, and Failures is the count of failed logons for that pair. Notice the Failures = count() bit - that is naming the aggregated column. Without the assignment, the column would be named count_, which is fine but ugly. Always name your aggregates.

Aggregation functions you will use most

Inside the summarize, you can compute several different summary values at once. The most useful aggregation functions in security work are these:

Function	What it returns
`count()`	Number of rows in the group.
`countif(predicate)`	Number of rows where the predicate is true. Saves you from running two summarize branches.
`dcount(column)`	Approximate distinct count. Very efficient even on huge data.
`min(column)` / `max(column)`	Smallest / largest value. Common for first-seen / last-seen times.
`sum(column)` / `avg(column)`	Total / average of a numeric column.
`take_any(column)`	Any single value of the column from the group. Cheaper than `min`/`max` when you just need a sample.
`arg_max(column, *)`	The row with the largest value of `column`, with `*` meaning "keep all the other columns from that row too." This is hugely useful and we will return to it.
`make_set(column)`	A deduplicated array of all values of the column in the group.
`make_list(column)`	An ordered array of all values, possibly with duplicates.

Here is a query that uses several of these together. It produces, for each user account in the last day, the total number of process executions, the number of distinct devices they ran processes on, the first time we saw them, the last time we saw them, and a sample of up to ten distinct file names:

SecurityEvent
| where TimeGenerated > ago(1d)
| where Account !endswith "$" and isnotempty(Account)
| summarize
    Executions    = count(),
    DeviceCount   = dcount(DeviceName),
    FirstSeen     = min(TimeGenerated),
    LastSeen      = max(TimeGenerated),
    SampleFiles   = make_set(FileName, 10)
  by AccountName
| order by Executions desc

That last argument inside make_set, the 10, caps the size of the resulting array. Without a cap, a noisy account could produce a make-set with thousands of file names, which is rarely what you want. Always cap your make_set and make_list calls.

The `arg_max` trick

Of all the aggregation functions, arg_max is probably the one that will most change how you write queries once you discover it. The signature is arg_max(column, *), where column is the value you want to maximize and * means "keep all the other columns from the winning row." It answers the question "for each group, give me the latest row, with everything that was on that row."

// Per device, the most recent network adapter info available
CommonSecurityLog
| where TimeGenerated > ago(7d)
| summarize arg_max(TimeGenerated, *) by DeviceName

That returns one row per device, with all of DeviceNetworkInfo's columns, taken from the most recent row for that device. Without arg_max, doing this would require either a join or a window function. With arg_max, it is one line.

10. Time buckets with `bin()`

The bin function is a small but extremely useful one. It takes a value and rounds it down to the nearest multiple of some unit. For datetime values, that means rounding down to the start of an hour, day, or any other interval. The killer use is grouping events by time bucket.

SecurityEvent
| where TimeGenerated > ago(24h) and EventID == 4625
| summarize Failures = count() by bin(TimeGenerated, 5m)
| order by TimeGenerated asc

That gives you the count of failed logons in each 5-minute bucket over the last 24 hours, in chronological order. You can plot the result as a time series in Sentinel directly. Adjust the bucket size to your question - 1m for fine-grained timelines, 1h for daily patterns, 1d for week-long patterns.

You can combine time bucketing with grouping by other columns:

SecurityEvent
| where TimeGenerated > ago(24h) and EventID in (4624, 4625)
| summarize
    Successes = countif(EventID == 4624),
    Failures  = countif(EventID == 4625)
  by Computer, bin(TimeGenerated, 15m)
| order by Computer asc, TimeGenerated asc

Now you have one row per (Computer, 15-minute window) with separate counts of successes and failures. Notice the countif trick - it counts only the rows where its condition is true. Without it, you would have to either run two separate summarize queries and join them, or split the data and union it back, both of which are uglier and slower.

11. Sorting and limiting

You have already met take, which gives you "any N rows" with no ordering guarantee. The two cousins of take are top and order by (which is the same as sort by, just spelled differently).

The order by operator sorts the entire row-set by one or more columns. It takes asc for ascending or desc for descending; the default is descending. Multiple columns are separated by commas (order matters):

DeviceProcessEvents
| where TimeGenerated > ago(1h)
| summarize n = count() by DeviceName, FileName
| order by n desc, DeviceName asc

The top operator combines sorting and limiting in one step: it sorts and then takes the top N. Use it when you only want the top so many rows by some metric:

| top 10 by n desc

That is functionally equivalent to order by n desc | take 10, but it expresses the intent more clearly. It is also slightly more efficient because the engine can stop early once it has the top N.

Worth pausing on Microsoft's documentation makes a small but useful point: at the end of a query you are exploring, finish with either | limit 100 (a synonym for take) or | count. Without it, on an unfamiliar dataset, your query may return gigabytes of rows before you realize you forgot to filter something. Adding the limit is a cheap insurance policy while you are still feeling out a new table.

12. The schema gotcha - when "the same field" isn't

This section is one of the most important in the whole guide, because it explains a class of bug that confuses new and experienced KQL users alike. The same conceptual idea - "what user logged on?" or "what was the logon type?" - is represented differently in different tables, sometimes in subtle ways that will silently produce wrong results if you do not know about them.

The clearest example is SecurityEvent versus DeviceLogonEvents. Both record logon-related activity. SecurityEvent is sourced from the older Microsoft Monitoring Agent reading the Windows Security event log; DeviceLogonEvents is sourced from Microsoft Defender for Endpoint. They overlap heavily but they are not the same table, and the schemas are not identical.

Look carefully at how each table represents LogonType:

Table	Column name	Data type	Example value
`SecurityEvent`	`LogonType`	`int`	`2`, `3`, `10`
`DeviceLogonEvents`	`LogonType`	`string`	`"Interactive"`, `"Network"`, `"RemoteInteractive"`

If you write where LogonType == 2 against SecurityEvent, you correctly get interactive logons. If you write the same thing against DeviceLogonEvents, you will get nothing - not because there are no interactive logons, but because the comparison "Interactive" == 2 is always false, and KQL will not even warn you. The query runs cleanly and returns zero rows, which is the worst possible failure mode.

The same pattern applies to account names. SecurityEvent has a unified Account column that looks like "CORP\\bryce" - domain and username glued together with a backslash. DeviceLogonEvents and DeviceProcessEvents instead have AccountDomain and AccountName as separate columns, with no backslash anywhere. If you have a query that filters on Account =~ "corp\\bryce" and you change the source table to DeviceProcessEvents without updating the filter, you will again get zero rows and no error.

From the field In one of my earlier query reviews I experienced an issue where the query had a default behavior of returning no results when the EventType input array was empty: where "FileAccess" in~ (EventType) against an empty EventType array silently returns nothing. That is a legitimate design choice - it acted as a safety net when run with no scoping arguments - but it also means the query "works" in the sense that it does not error, while producing a zero-row output that an analyst could mistake for "no findings." Empty results are not the same as no findings; they are no answer. Always sanity-check that your filters are not silently filtering away everything.

What to do about it

The defense against these types of issues is to know the schemas of the tables you query most. Run TableName | getschema the first time you touch a new table, and again whenever a query returns zero rows when you expected some. Keep a small reference card for the tables you use most if you feel it is necessary. The appendix at the end of this document has compact schema cards for the tables you will encounter day to day.

13. Combining tables: `union` and `join`

So far every query has touched a single table. Real investigations often span multiple tables, because the data you need is spread across them - logons in one table, process executions in another, network activity in a third. KQL gives us two operators for combining tables: union, which stacks tables on top of each other, and join, which links rows from one table to rows from another based on a key.

`union`

Use union when you want all the rows from multiple tables together as if they came from one. The tables do not need to have the same schema; columns that are in one table but not another will simply be empty for rows that came from the other table. The most common use is when the same conceptual data lives in two tables - for instance, both SecurityEvent (4624) and SigninLogs record logon activity, but for different sources.

union
    (SecurityEvent | where TimeGenerated > ago(1h) and EventID == 4624
                   | project TimeGenerated, Computer, Account, IpAddress, Source = "SecEvent"),
    (DeviceLogonEvents | where TimeGenerated > ago(1h) and ActionType == "LogonSuccess"
                       | project TimeGenerated, Computer = DeviceName, Account = AccountName, IpAddress = RemoteIP, Source = "DefenderEDR")
| order by TimeGenerated desc

Notice that I projected each side to the same column names before unioning. That is not strictly required - union will tolerate mismatched columns - but it makes the result much cleaner. Notice also the added Source column, which lets you tell which table each row came from. This is a good habit; it costs nothing and you will be glad of it five minutes later when you wonder why a row looks weird.

`join` and the trap that catches everyone

The join operator merges two tables based on matching values in one or more columns, like SQL joins. The general syntax is:

LeftTable
| join kind=<kind> (RightTable) on Key

The kind= parameter is critical. There are several kinds of join (9 to be exact): inner (only matching rows from both sides), leftouter (all left rows, matching right or null), leftanti (left rows with no match in right), and a few more shown in the image below. I will cover the most useful ones in a moment. But first, the trap.

The default-join trap If you write LeftTable | join (RightTable) on Key with no kind= specified, KQL does not default to a SQL-style inner join. It defaults to a kind called innerunique, which silently de-duplicates the left side on the join key before joining. This means duplicate left rows are dropped before you ever see them, and your query returns fewer rows than you expected, with no error and no warning. Always specify kind= explicitly. If you want a SQL inner join, write kind=inner. If you want all left rows, write kind=leftouter. The one thing you almost certainly do not want is the default.

With that warning recorded, here are the joins you will use most. The mental model that helps is to picture two columns of rows next to each other, and ask: "for each row on the left, what should happen?"

`kind=`	For each left row...	Use it for
`inner`	Keep it once for every matching right row; drop if no match	True intersection of two sets
`leftouter`	Keep it; matched right columns or null	Enrichment - adding optional context
`leftanti`	Keep it only if there is no match in the right	"First time seen" - events not in your baseline
`innerunique`	De-dup left first, then inner join	The default; usually not what you want

A typical enrichment join

A very common pattern is "I have some events, and I want to attach extra context from another table." The right operator for this is almost always kind=leftouter:

// Logons enriched with the most recent device info per device
let RecentDeviceInfo = DeviceNetworkInfo
    | where TimeGenerated > ago(7d)
    | summarize arg_max(TimeGenerated, *) by DeviceName
    | project DeviceName, IPAddresses, MacAddress;
DeviceLogonEvents
| where TimeGenerated > ago(1h) and ActionType == "LogonSuccess"
| join kind=leftouter (RecentDeviceInfo) on DeviceName
| project TimeGenerated, DeviceName, AccountName, LogonType, IPAddresses, MacAddress

Read this slowly. You first define a small lookup table called RecentDeviceInfo with one row per device, holding only the columns you want to attach. Then you take the actual logon events, do a left-outer join to that lookup, and project the fields you care about. Each logon event keeps everything it had before and gains the device's most recent IP and MAC address. If the device is brand new and not in the lookup table, those fields will be empty rather than dropping the row - that is the "leftouter" behavior in action.

The "first time seen" idiom

One of the most useful patterns in detection engineering is "show me the things I have not seen before." The leftanti join makes this clean:

// Process executions in the last hour that did not appear anywhere in the last 30 days
let Baseline = DeviceProcessEvents
    | where TimeGenerated between (ago(30d) .. ago(1h))
    | distinct DeviceName, FileName, SHA256;
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| join kind=leftanti (Baseline) on DeviceName, FileName, SHA256

Each row in the recent events is kept only if there is no matching row in the baseline. The result is the set of "new" things - process executions on a host running with a hash combination that has never been seen on that host before in the last 30 days. This is much cleaner than trying to do it with !in against a giant in-memory list.

14. Parsing messy fields

Sooner or later you will need to extract a value out of a string column that does not have its own dedicated column. The classic example is the EventData column on SecurityEvent: it contains the raw XML payload of the underlying Windows Security event, with all the field-by-field details that did not get promoted to top-level columns. To get at any of those details, you have to parse.

KQL has several tools for this, ranging from "easy and fast" to "powerful but slow." In rough order:

`parse` - easy when the data has a stable shape

The parse operator works when the source string follows a stable pattern of literal text and variable values. You write the literal text in quotes and the variable parts as bare names, and KQL extracts them. I will outline three examples below to help outline multiple use cases.

This is the simplest execution of parse, you are looking solely for the value between "from" and "port". In this case, that is the Source IP so we name it accordingly.

Syslog
| where TimeGenerated > ago(1h) and ProcessName == "sshd-session"
| parse SyslogMessage with * " from " SourceIP " port " *
| project TimeGenerated, Computer, SourceIP

This is a similar, but more specific parse, instead of parsing one value, you are parsing out three. This interprets more as a sentence than above, where all strings (for, from, port, ssh2:) have to exist and in the manner you typed them for the data to be parsed out. Any typographical error or SyslogMessage that does not contain the full sentence will not be parsed.

Syslog
| where TimeGenerated > ago(1h) and ProcessName == "sshd-session"
| parse SyslogMessage with * "for " User " from " SourceIP " port " Port " ssh2:" *
| project TimeGenerated, Computer, User, SourceIP, Port

The last example has a subtle difference, the * was removed from the end of the parse statement. You should use this whenever the value you are parsing out is at the end of the data, otherwise you will receive an error for ambiguity. A second change was adding :string and :int to the User and Port variables inside the parse statement. This allows for explicit type casting during extraction, it is not necessary, but depending on the data set, can be useful.

Syslog
| where TimeGenerated > ago(1h) and ProcessName == "sshd-session"
| parse SyslogMessage with * "for " User:string " from " SourceIP " port " Port:int " ssh2: " PublicKey
| project TimeGenerated, Computer, User, SourceIP, Port, PublicKey

The asterisks at the start and end mean "skip everything up to the next literal" and "ignore anything after the last literal." The string fragments are the literal text you expect to see, and User, SourceIP, Port are the variable parts you want to extract. Microsoft's official guidance is to prefer parse over multiple extract() calls when the strings all follow the same format - it is one operator, the engine optimizes it together, and it is easier to read.

`extract` and `extract_all` - when the format varies

When the string is not predictably formatted, regex is your tool. extract pulls out a single capture group; extract_all pulls out all of them.

| extend Host = extract(@"https?://([^/:]+)", 1, RemoteUrl)

The first argument is the regex pattern, the second argument is which capture group to return (1-indexed), and the third argument is the column to extract from. The result is the captured text or an empty string if there is no match.

Sysmon EventData and dynamic XML - the pattern I developed

For Sysmon events forwarded into the SecurityEvent table, the field-level data lives inside the EventData string as XML. There are two common ways to get at it. The first is to use extract with a regex against the raw string, which is robust and easy to reason about. The second is to use parse_xml followed by mv-apply with bag_pack to construct a key-value bag of every <Data Name="..."> element. Here is the second pattern, in the form I have used for Sysmon process-access events:

SecurityEvent
| where TimeGenerated > ago(1h) and EventID == 10
| extend CleanXml = replace_string(
   EventData, @' xmlns="http://schemas.microsoft.com/win/2004/08/events/event"', ""
  )
| extend ParsedXml = parse_xml(CleanXml)
| extend DataFields = ParsedXml.EventData.Data
| mv-apply Field = DataFields on (
    extend Key = tostring(Field['@Name']), Value = Field['#text']
      | summarize PackedBag = make_bag(bag_pack(Key, Value))
  )
| evaluate bag_unpack(PackedBag, "Unpacked_")
| project TimeGenerated, Computer,
          SourceImage   = tostring(Unpacked_SourceImage),
          TargetImage   = tostring(Unpacked_TargetImage),
          GrantedAccess = tostring(Unpacked_GrantedAccess),
          CallTrace     = tostring(Unpacked_CallTrace)

This is more advanced than where you are right now, and that is fine. The reason it appears here is so you have a complete worked example to come back to once you start touching Sysmon data seriously. Read the steps and trust that they will become clear with practice. The high-level shape is: parse the XML, walk over each Data element, build a property bag mapping Name → value, and then read fields out of the bag by name.

From the field When I built this pattern, I considered both approaches - regex extract versus XML parse - and ended up using both in different contexts. Regex is more forgiving when the XML structure varies between collectors. XML parse is cleaner when the structure is reliable.

Arrays and `mv-expand`

You have just seen mv-apply. Its simpler cousin is mv-expand, which takes a single row containing an array and turns it into multiple rows, one per element. If a row has columns Host = "WS01" and Tags = ["a","b","c"], then mv-expand Tags produces three rows, all with Host = "WS01", and Tags set to "a", then "b", then "c". This is invaluable when a column holds a list and you want to filter on individual elements.

DeviceProcessEvents
| where TimeGenerated > ago(1h) and ProcessCommandLine has "powershell"
| extend Tokens = split(ProcessCommandLine, " ")
| mv-expand Token = Tokens
| where Token matches regex @"^[A-Za-z0-9+/=]{200,}$"
| project TimeGenerated, Token, Tokens

That query splits each PowerShell command line on whitespace, expands one row per token, and then keeps only tokens that look like long base64-ish blobs - a quick way to find encoded PowerShell payloads. mv-expand is the bridge between a column-oriented world and a row-oriented one when the column contains a list.

15. Things that have bitten us

This section is the most concentrated value in the document. Each item here is a real error I have caught in past query work. Read these in advance and you will recognize them when they happen to you.

The double-materialize error

This one came up in a review for a MITRE ATT&CK association query and is worth understanding. The setup was a query with a shared base - BaselineSecEvent - that was used to feed two downstream sub-queries, FileOrObjectAccessEvents and SystemAccess. The original query looked something like this in shape:

let FileOrObjectAccessEvents = materialize(
    SecurityEvent
    | where ... lots of common logic ...
    | where EventID in (2, 6, 7, 11, 12, ...)
);
let SystemAccess = materialize(
    SecurityEvent
    | where ... lots of common logic, again ...
    | where EventID in (4624, 4625, 4634, ...)
);
union FileOrObjectAccessEvents, SystemAccess

The issue: the materialize() function forces the query engine to execute a subquery exactly once, cache its results in memory, and then reuse that cached data for any subsequent references within the same query.. The big SecurityEvent scan and all its parsing and lookups happened twice - once for each materialized sub-query. The fix was to materialize the shared base once and let the sub-queries read from it:

let BaselineSecEvent = materialize(
    SecurityEvent
    | where TimeGenerated between (StartTime .. EndTime)
    | where ... all the common logic, run once ...
);
let FileOrObjectAccessEvents = BaselineSecEvent | where EventID in (2, 6, 7, 11, ...);
let SystemAccess = BaselineSecEvent | where EventID in (4624, 4625, 4634, ...);
union FileOrObjectAccessEvents, SystemAccess

The lesson: materialize is cheap to use but expensive to misuse. When you have shared logic that feeds multiple consumers, materialize the shared piece, not each consumer. Two materializes around the same expensive base means doing the expensive work twice.

The "first match wins" lookup bug

In the same review I caught a subtler issue. There was a lookup table mapping Windows Event IDs to MITRE ATT&CK techniques. Event 4625 (failed logon) is associated with both T1110.001 (password guessing) and T1110.002 (password cracking) - two techniques. When the lookup was done with a simple join, only one of those mappings was kept; the other was silently dropped. The lesson: any time a lookup key can map to multiple values in your reference table, a join will not return what you might expect. Either explicitly handle the multi-row case, use make_set to gather all matches, or restructure the lookup.

The "empty array silently filters everything" trap

The same query had a guard pattern: input scoping arrays like EventType = dynamic([]) with a filter where "FileAccess" in~ (EventType). When the array was empty, the filter would be false for every row and the query would return zero results - not an error, just nothing. This is a legitimate design as a safety net (so the query does not run unbounded), but it also means that "zero results" can mean either "no findings" or "you forgot to populate the input." Always sanity-check that your filters are not silently filtering away everything.

String-to-int comparison and silent zero-row results

The schema gotcha section above covered this in detail, but it is worth repeating in this list. Comparing EventID == "4624" (string against an int column) works in some KQL contexts and silently produces zero rows in others. Always match the type. If you are not sure of the type, run YourTable | getschema.

Tiny typos that produce zero matches

From the same review: a literal string @"N\A" where the intent was "N/A". The escape-aware string had a backslash where there should have been a forward slash, so anything you compared to it would not match. There was no error; the comparison just always failed. When you have a query that "looks right" but returns nothing, copy-paste a literal value from the data into your filter rather than retyping it.

The pattern across all of these Every one of these bugs has the same shape: the query runs cleanly with no error, and produces a wrong answer (often zero rows). Check the schema, check your literal values, check your join kinds, and check whether something silently filtered out everything. KQL is very willing to do exactly what you said when what you said was not what you meant.

16. Reading a real query out loud

To consolidate everything I have covered, let us walk through a non-trivial query as if you had encountered it in a wiki and needed to understand it before running it. Here is the query:

let Window = 5m;
let Fails = SecurityEvent
    | where TimeGenerated > ago(24h) and EventID == 4625
    | where Account !endswith "$"
    | project FailTime = TimeGenerated, Account, IpAddress, Computer;
let Wins = SecurityEvent
    | where TimeGenerated > ago(24h) and EventID == 4624
    | project WinTime = TimeGenerated, Account, IpAddress, Computer;
Fails
| join kind=inner Wins on Account, IpAddress, Computer
| where WinTime > FailTime and WinTime - FailTime <= Window
| summarize
    FailCount = count(),
    FirstFail = min(FailTime),
    Success   = min(WinTime)
  by Account, IpAddress, Computer
| where FailCount >= 3
| order by Success desc

Walk through it line by line, saying "then" at each pipe. The query starts by defining a constant Window = 5m - a five-minute time window you will use later. It then defines two named subqueries with let: Fails is all 4625 failed-logon events from the last 24 hours, with the timestamp renamed to FailTime so you can tell it apart later; Wins is the same shape but for 4624 successful logons, with WinTime.

How the query should be read

Click the "Show solution" button to check if you read/interpreted the query right.

Show solution

let the Window equal five minutes;

Then let Fails equal SecurityEvent
| where the TimeGenerated is greater than the time was 24 hours ago and the EventID is equal to 4625
| then take Accounts that do not end with "$"
| then project the columns where FailTime equals TimeGenerated, Account, IpAddress, and Computer;

Next, let Wins equal SecurityEvent
| then take events where the TimeGenerated is greater than the time was 24 hours ago and the EventID is equal to 4624
| then project the columns where WinTime equals TimeGenerated, Account, IpAddress, and Computer;

Next take Fails
| then join Fails and Wins where they have matching row values for the columns Account, IpAddress, and Computer
| then take the WinTimes that are greater than the FailTimes and where their difference is less than or equal to five minutes
| then summarize the count of failed logins, the first failure time, and the first successful login time by the Account, IpAddress, and Computer
| then take the FailCounts that are greater than or equal to three
| then order by Success (First successful login time) descending

The actual computation starts at the third let-less block. Read it: "Take the Fails table, then inner-join it to the Wins table on the combination of Account, IpAddress, and Computer. Then keep only matched rows where the win came after the fail and the gap was at most five minutes. Then group by Account, IpAddress, and Computer, counting fails and grabbing the earliest fail time and earliest success time per group. Then keep only groups with three or more fails. Then sort by success time, most recent first."

What this query does, in plain language: it finds cases where the same account, from the same IP, on the same machine, had three or more failed logons within a five-minute window followed by a successful one. That is a classic password-guessing-then-success pattern.

If you can read a query like that comfortably and predict roughly what its output will look like before running it, you are ready for some exercises.

17. Practice exercises

These are intentionally easy. The goal is to use what you just read, not to test you. Each exercise has a worked solution that you can expand. Try the exercise in your own environment first, and only open the solution after you have something working - even if "working" means "does not error but I am not sure of the output."

01 Show the last 20 successful logons in the last hour

Use SecurityEvent. EventID 4624 is a successful logon. Project just the time, computer, account, and IP address. Sort so the most recent is at the top and limit to 20 rows.

Show solution

SecurityEvent
| where TimeGenerated > ago(1h)
| where EventID == 4624
| project TimeGenerated, Computer, Account, IpAddress
| order by TimeGenerated desc
| take 20

Three things to notice about this solution. First, the time filter comes before the EventID filter - time is always first. Second, the project happens before the order by, which is good practice because it reduces the amount of data the engine has to sort. Third, I used take at the very end after sorting, which is the safe pattern. Using top 20 by TimeGenerated desc would also work and is slightly more efficient.

02 Count failed logons per account in the last 24 hours, exclude machine accounts

Use SecurityEvent. EventID 4625 is a failed logon. Group by Account, count, and only show accounts with at least 5 failures.

Show solution

SecurityEvent
| where TimeGenerated > ago(24h) and EventID == 4625
| where Account !endswith "$"
| summarize Failures = count() by Account
| where Failures >= 5
| order by Failures desc

The !endswith "$" filter is the convention for excluding machine accounts. Notice that I filter on the count after the summarize, which is the only place you can - the Failures column does not exist before that line.

03 List distinct processes per device in the last hour

Use DeviceProcessEvents. For each (DeviceName, FileName) pair, give the first time you saw it, the last time you saw it, and a count of executions.

Show solution

DeviceProcessEvents
| where TimeGenerated > ago(1h)
| summarize
    FirstSeen = min(TimeGenerated),
    LastSeen  = max(TimeGenerated),
    Executions = count()
  by DeviceName, FileName
| order by Executions desc

This is the bread-and-butter "what ran where" query. If you wanted only suspicious-looking executables, add a filter on FileName after the summarize, or filter InitiatingProcessFileName before the summarize to scope to specific parents. The point of this exercise is just to get comfortable with multiple aggregates in one summarize.

04 Plot a 5-minute timeline of failed logons over the last 6 hours

Use SecurityEvent. Group failures into 5-minute buckets. The result should be one row per bucket with a count.

Show solution

SecurityEvent
| where TimeGenerated > ago(6h) and EventID == 4625
| summarize Failures = count() by bin(TimeGenerated, 5m)
| order by TimeGenerated asc
| render timechart

The render timechart at the end is a Sentinel-friendly way of saying "draw this as a line chart." You can leave it off if you just want the table. Notice that the bin result is named TimeGenerated by default - the binning preserves the source column name, which is convenient because it lines up with the chart axis.

05 Find PowerShell commands containing any of "Invoke-Expression," "DownloadString," or "FromBase64String"

Use DeviceProcessEvents and look at ProcessCommandLine. Use has_any with a list. Project just the time, device, account, and command line.

Show solution

let Suspicious = dynamic(["Invoke-Expression", "DownloadString", "FromBase64String"]);
DeviceProcessEvents
| where TimeGenerated > ago(1d)
| where FileName =~ "powershell.exe"
| where ProcessCommandLine has_any (Suspicious)
| project TimeGenerated, DeviceName, AccountName, ProcessCommandLine
| order by TimeGenerated desc

Three things to notice. First, has_any takes a parenthesized argument that is the array. Second, you filter on FileName =~ "powershell.exe" before the has_any filter - this narrows the rows the more expensive string match has to look at. Third, we used =~ for the file name because we do not care about case.

06 Enrich a logon event with device info

Show the last 20 logons from DeviceLogonEvents in the last hour, joined to DeviceNetworkInfo to show the device's most recent IP address.

Show solution

let RecentInfo = DeviceNetworkInfo
    | where TimeGenerated > ago(7d)
    | summarize arg_max(TimeGenerated, *) by DeviceName
    | project DeviceName, IPAddresses, MacAddress;
DeviceLogonEvents
| where TimeGenerated > ago(1h) and ActionType == "LogonSuccess"
| join kind=leftouter (RecentInfo) on DeviceName
| project TimeGenerated, DeviceName, AccountName, LogonType, IPAddresses
| order by TimeGenerated desc
| take 20

Two things to notice. First, we explicitly wrote kind=leftouter. You do not want the default innerunique; we want to keep every logon event whether or not we have a matching device-info row. Second, the lookup table RecentInfo has exactly one row per device thanks to the arg_max summarize, so the join cannot fan out into duplicates.

07 Find new processes that have not run on a device in the last 30 days

Use DeviceProcessEvents. Build a 30-day baseline of (DeviceName, FileName, SHA256), then find executions in the last hour that do not match anything in the baseline.

Show solution

let Baseline = DeviceProcessEvents
    | where TimeGenerated between (ago(30d) .. ago(1h))
    | distinct DeviceName, FileName, SHA256;
DeviceProcessEvents
| where TimeGenerated > ago(1h)
| join kind=leftanti (Baseline) on DeviceName, FileName, SHA256
| project TimeGenerated, DeviceName, AccountName, FileName, ProcessCommandLine, SHA256
| order by TimeGenerated desc

This is the "first time seen on this host" pattern using leftanti. It is one of the most useful detection idioms in KQL. Notice that the baseline window explicitly stops at ago(1h), not at now() - we want to compare recent events to a baseline that does not already include those recent events.

18. Quick reference appendix

Operators by purpose

Group these in your head by what they do, not alphabetically:

Goal	Operators
Filter rows	`where`, `distinct`, `take`, `top`, `limit`
Pick / rename / drop columns	`project`, `project-keep`, `project-rename`, `project-away`, `project-reorder`
Add computed columns	`extend`
Group and aggregate	`summarize ... by ...`
Sort	`order by` (synonym `sort by`)
Combine tables	`union`, `join kind=...`, `lookup`
Parse strings	`parse`, `parse_json`, `parse_xml`, `extract`, `extract_all`, `split`
Work with arrays	`mv-expand`, `mv-apply`, `make_set`, `make_list`
Time math	`ago()`, `now()`, `bin()`, `datetime_part()`, `format_datetime()`
Render results	`render timechart`, `render barchart`, `render columnchart`

Schema mini-cards for the tables you will use most

These are condensed reference cards. Bold columns are the ones you will reach for almost every query.

SecurityEvent

Windows Security event log forwarded by the Microsoft Monitoring Agent. Wide table with many columns, most empty per row. Holds 4624/4625 logons, 4688 process create, 4697 service install, etc.

TimeGenerated:datetime, Computer:string, EventID:int, Account:string, Activity:string, AccountType:string, AccountDomain:string, AccountName:string, Channel:string, IpAddress:string, IpPort:string, LogonType:int, LogonProcessName:string, FailureReason:string, ProcessName:string, ParentProcessName:string, NewProcessName:string, NewProcessId:string, ProcessId:string, CommandLine:string, SubjectUserName:string, SubjectUserSid:string, TargetUserName:string, TargetUserSid:string, TargetDomainName:string, TargetLogonId:string, EventSourceName:string, EventData:string (raw XML)

DeviceProcessEvents

Defender for Endpoint process-creation telemetry. Modern alternative to SecurityEvent 4688 with richer fields including hashes and signature info.

TimeGenerated:datetime, DeviceName:string, DeviceId:string, AccountName:string, AccountDomain:string, FileName:string, FolderPath:string, ProcessCommandLine:string, ProcessId:long, SHA256:string, SHA1:string, MD5:string, FileSize:long, InitiatingProcessFileName:string, InitiatingProcessFolderPath:string, InitiatingProcessCommandLine:string, InitiatingProcessId:long, InitiatingProcessParentFileName:string, InitiatingProcessParentId:long, InitiatingProcessAccountName:string, InitiatingProcessSHA256:string, ProcessIntegrityLevel:string, ProcessTokenElevation:string, InitiatingProcessSignerType:string, InitiatingProcessSignatureStatus:string

DeviceLogonEvents

Defender for Endpoint logon telemetry. Note: LogonType is a string here ("Interactive", "Network", "RemoteInteractive"...) unlike SecurityEvent.

TimeGenerated:datetime, DeviceName:string, ActionType:string ("LogonSuccess", "LogonFailed"), AccountName:string, AccountDomain:string, AccountSid:string, LogonType:string, FailureReason:string, IsLocalAdmin:bool, LogonId:long, RemoteIP:string, RemoteIPType:string, RemotePort:int, RemoteDeviceName:string, Protocol:string, InitiatingProcessFileName:string, InitiatingProcessAccountName:string

DeviceNetworkEvents

Defender for Endpoint network-connection telemetry. Use for outbound connection analysis, beaconing detection, etc.

TimeGenerated:datetime, DeviceName:string, ActionType:string ("ConnectionSuccess", "ConnectionFailed", "ListeningConnectionCreated", ...), LocalIP:string, LocalIPType:string, LocalPort:int, RemoteIP:string, RemoteIPType:string ("Public", "Private"), RemotePort:int, RemoteUrl:string, Protocol:string, InitiatingProcessFileName:string, InitiatingProcessFolderPath:string, InitiatingProcessCommandLine:string, InitiatingProcessAccountName:string, InitiatingProcessSHA256:string

DeviceFileEvents

Defender for Endpoint file-system events. File creation, modification, deletion. Useful for tracking file-based persistence, ransomware behavior, suspicious drops.

TimeGenerated:datetime, DeviceName:string, ActionType:string, FileName:string, FolderPath:string, SHA256:string, SHA1:string, MD5:string, FileSize:long, FileOriginUrl:string, FileOriginIP:string, RequestSourceIP:string, RequestProtocol:string, ShareName:string, InitiatingProcessFileName:string, InitiatingProcessAccountName:string, InitiatingProcessCommandLine:string, PreviousFileName:string, PreviousFolderPath:string

DeviceImageLoadEvents

Defender for Endpoint DLL/module load events. Useful for finding DLL hijacking, unsigned modules in critical processes, suspicious module loads in lsass.

TimeGenerated:datetime, DeviceName:string, FileName:string (the DLL), FolderPath:string, SHA256:string, SHA1:string, MD5:string, InitiatingProcessFileName:string (the loading process), InitiatingProcessFolderPath:string, InitiatingProcessId:long, InitiatingProcessAccountName:string, InitiatingProcessSHA256:string

DeviceRegistryEvents

Registry create/modify/delete telemetry. Persistence keys, run keys, autoruns, services.

TimeGenerated:datetime, DeviceName:string, ActionType:string ("RegistryValueSet", "RegistryKeyCreated", ...), RegistryKey:string, RegistryValueName:string, RegistryValueData:string, RegistryValueType:string, PreviousRegistryKey:string, PreviousRegistryValueName:string, PreviousRegistryValueData:string, InitiatingProcessFileName:string, InitiatingProcessAccountName:string, InitiatingProcessCommandLine:string

DeviceEvents

Catch-all Defender for Endpoint events. Various ActionType values for behaviors that do not fit the other tables: AMSI, ASR, exploit guard, etc. The detailed payload often lives in AdditionalFields as JSON.

TimeGenerated:datetime, DeviceName:string, ActionType:string, AdditionalFields:dynamic (JSON), FileName:string, FolderPath:string, InitiatingProcessFileName:string, InitiatingProcessCommandLine:string, AccountName:string, AccountDomain:string, RemoteIP:string, RemotePort:int, RemoteUrl:string, ProcessCommandLine:string, ProcessId:long

SigninLogs

Microsoft Entra (Azure AD) sign-in logs. Cloud authentication events, conditional access decisions, MFA outcomes.

TimeGenerated:datetime, UserPrincipalName:string, UserDisplayName:string, UserId:string, IPAddress:string, AppDisplayName:string, AppId:string, ClientAppUsed:string, ResultType:string, ResultDescription:string, ConditionalAccessStatus:string, IsRisky:bool, RiskLevelDuringSignIn:string, RiskState:string, RiskEventTypes:string, LocationDetails:dynamic, DeviceDetail:dynamic, AuthenticationRequirement:string, AuthenticationDetails:string, MfaDetail:dynamic, IsInteractive:bool, UserAgent:string

Syslog

Linux/Unix syslog forwarded into Sentinel. Free-form text in SyslogMessage; you will parse this with parse or extract.

TimeGenerated:datetime, Computer:string, HostName:string, HostIP:string, Facility:string, SeverityLevel:string, ProcessName:string, ProcessID:int, SyslogMessage:string, EventTime:datetime, CollectorHostName:string

SecurityAlert

Sentinel-side alerts emitted by analytics rules and connected products.

TimeGenerated:datetime, DisplayName:string, AlertName:string, AlertSeverity:string, Description:string, ProviderName:string, VendorName:string, ProductName:string, AlertType:string, StartTime:datetime, EndTime:datetime, Entities:string (JSON), Tactics:string, Techniques:string, SubTechniques:string, Status:string, IsIncident:bool, ConfidenceScore:real

SecurityIncident

Sentinel incident records - the higher-level grouping that holds related alerts.

TimeGenerated:datetime, IncidentNumber:int, Title:string, Description:string, Severity:string, Status:string, Classification:string, ClassificationReason:string, FirstActivityTime:datetime, LastActivityTime:datetime, AlertIds:dynamic, BookmarkIds:dynamic, ModifiedBy:string

Common Windows Security Event IDs

EventID	Meaning
4624	An account was successfully logged on
4625	An account failed to log on
4634 / 4647	An account was logged off
4648	A logon was attempted using explicit credentials
4672	Special privileges assigned to new logon
4688	A new process has been created
4697	A service was installed
4720 / 4732 / 4738	Account created / member added to group / account changed
4768 / 4769 / 4771	Kerberos TGT / service ticket / pre-auth failure
4776	NTLM credential validation
1102	The audit log was cleared

LogonType meanings

Code	Name	What it means
2	Interactive	Console logon (someone typed their password at the keyboard)
3	Network	Network access to shared resources (most common type)
4	Batch	Scheduled tasks
5	Service	Service account starting a service
7	Unlock	Workstation unlock
8	NetworkCleartext	Network logon with cleartext credentials (suspicious)
9	NewCredentials	RunAs with /netonly
10	RemoteInteractive	RDP
11	CachedInteractive	Cached domain credentials (laptop offline)

1. How to read this guide

2. The mental model: pipes and tables

3. Your first query

4. Time is the most important column

5. Filtering with where

Comparison operators you will use constantly

6. String comparisons in depth

== and =~ for exact match

has versus contains

Multi-term searches: has_any and has_all

Prefix and suffix: startswith and endswith

Regular expressions: matches regex

The not function

Summary table you can refer back to

7. Choosing columns with project

The project family

One subtle thing about projecting

8. Computing new columns with extend

Conditional values with iff and case

9. Counting things with summarize

Aggregation functions you will use most

The arg_max trick

10. Time buckets with bin()

11. Sorting and limiting

12. The schema gotcha - when "the same field" isn't

What to do about it

13. Combining tables: union and join

union

join and the trap that catches everyone

A typical enrichment join

The "first time seen" idiom

14. Parsing messy fields

parse - easy when the data has a stable shape

extract and extract_all - when the format varies

Sysmon EventData and dynamic XML - the pattern I developed

Arrays and mv-expand

15. Things that have bitten us

The double-materialize error

The "first match wins" lookup bug

The "empty array silently filters everything" trap

String-to-int comparison and silent zero-row results

Tiny typos that produce zero matches

16. Reading a real query out loud

How the query should be read

17. Practice exercises

01 Show the last 20 successful logons in the last hour

02 Count failed logons per account in the last 24 hours, exclude machine accounts

03 List distinct processes per device in the last hour

04 Plot a 5-minute timeline of failed logons over the last 6 hours

05 Find PowerShell commands containing any of "Invoke-Expression," "DownloadString," or "FromBase64String"

06 Enrich a logon event with device info

07 Find new processes that have not run on a device in the last 30 days

18. Quick reference appendix

Operators by purpose

Schema mini-cards for the tables you will use most

SecurityEvent

DeviceProcessEvents

DeviceLogonEvents

DeviceNetworkEvents

DeviceFileEvents

DeviceImageLoadEvents

DeviceRegistryEvents

DeviceEvents

SigninLogs

Syslog

SecurityAlert

SecurityIncident

Common Windows Security Event IDs

LogonType meanings

5. Filtering with `where`

`==` and `=~` for exact match

`has` versus `contains`

Multi-term searches: `has_any` and `has_all`

Prefix and suffix: `startswith` and `endswith`

Regular expressions: `matches regex`

The `not` function

7. Choosing columns with `project`

8. Computing new columns with `extend`

Conditional values with `iff` and `case`

9. Counting things with `summarize`

The `arg_max` trick

10. Time buckets with `bin()`

13. Combining tables: `union` and `join`

`union`

`join` and the trap that catches everyone

`parse` - easy when the data has a stable shape

`extract` and `extract_all` - when the format varies

Arrays and `mv-expand`