March 20, 2026 by Arif Aslam 5 min read

Parsing and Analyzing Server Log Files

You have a CSV export of server log entries. Maybe you pulled them from CloudWatch, maybe someone ran a script that dumped the last 50,000 lines into a spreadsheet. Each row has a single column with a raw log line like:

2024-03-15 09:23:45 ERROR [auth-service] Failed login for user@example.com from 192.168.1.1

That is a wall of text. You cannot filter by log level, you cannot group by service, you cannot chart errors over time. You need to break this apart into structured columns first, and then you can actually analyze it.

Here is how to go from raw log lines to structured, filterable data in ExploreMyData using seven pipeline steps.

Step 1: Extract the timestamp

Open the toolbar and select Extract Text from the Transform group. This operation pulls substrings out of a text column using patterns.

Source column: log_line (or whatever your raw column is called)
Method: regex
Pattern: \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}
New column name: timestamp_raw

This regex matches the standard datetime format at the start of each log line. The result is a new VARCHAR column containing just the timestamp string, like 2024-03-15 09:23:45.

log_line (raw)	timestamp_raw (extracted)
2024-03-15 09:23:45 ERROR [auth-service] Failed login for user@example.com	2024-03-15 09:23:45
2024-03-15 09:24:02 INFO [api-gateway] Request completed in 142ms	2024-03-15 09:24:02
2024-03-15 09:24:18 WARN [db-connector] Slow query: 3.2s on users table	2024-03-15 09:24:18
2024-03-15 09:25:01 ERROR [auth-service] Token expired for session abc123	2024-03-15 09:25:01

Regex \d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} pulls the timestamp from each line. The raw log_line column is kept until all extractions are done.

Step 2: Extract the log level

Apply Extract Text again. The log level (ERROR, WARN, INFO, DEBUG) sits between the timestamp and the bracketed service name. You have two options:

Regex: pattern (ERROR|WARN|INFO|DEBUG)
Between delimiters: after the timestamp's trailing space and before the [ character

Either way, name the result log_level. The regex approach is more reliable here because it matches the known set of levels regardless of extra whitespace.

Step 3: Extract the service name

One more Extract Text. The service name is wrapped in square brackets: [auth-service]. Use the between-delimiters method:

Method: between delimiters
Start delimiter: [
End delimiter: ]
New column name: service

Now you have three new columns alongside the original log line: timestamp_raw, log_level, and service.

timestamp_raw	log_level	service	log_line (original)
2024-03-15 09:23:45	ERROR	auth-service	2024-03-15 09:23:45 ERROR [auth-service] Failed login…
2024-03-15 09:24:02	INFO	api-gateway	2024-03-15 09:24:02 INFO [api-gateway] Request completed…
2024-03-15 09:24:18	WARN	db-connector	2024-03-15 09:24:18 WARN [db-connector] Slow query…
2024-03-15 09:25:01	ERROR	auth-service	2024-03-15 09:25:01 ERROR [auth-service] Token expired…

Three new structured columns extracted from a single raw text column. Now filterable, sortable, and groupable by level and service.

Step 4: Convert the timestamp

The extracted timestamp is still a string. Select Convert Type from the Transform group:

Column: timestamp_raw
Target type: TIMESTAMP

DuckDB's TRY_CAST() handles the conversion. Any malformed timestamps become NULL instead of causing an error. Once this is a proper TIMESTAMP, you can sort chronologically, filter by date ranges, and extract date parts.

Step 5: Extract the hour for time-of-day analysis

Select Extract Date Part from the Transform group:

Source column: timestamp_raw
Part to extract: hour
New column name: hour_of_day

This gives you a number from 0 to 23. Now you can group by hour to see when errors spike. Are most failures happening at 2 AM during batch jobs? Or at 9 AM when users start logging in?

Step 6: Find the most common entries

Select Top / Bottom Rows from the Filter & Sort group:

Mode: Top
Column: log_level (or service)
Count: 10

This surfaces the most frequent log levels or the services generating the most entries. If one service accounts for 80% of your ERROR lines, that is where to look first.

Step 7: Filter to errors only

Select Filter from the Filter & Sort group:

Column: log_level
Operator: equals
Value: ERROR

Now you are looking at only the error entries, with structured columns you can sort and group. Want to drill into a specific service? Add another filter for service = 'auth-service'.

timestamp_raw	log_level	service	hour_of_day	message
2024-03-15 09:23:45	ERROR	auth-service	9	Failed login for user@example.com from 192.168.1.1
2024-03-15 09:25:01	ERROR	auth-service	9	Token expired for session abc123
2024-03-15 02:11:44	ERROR	batch-processor	2	Timeout after 30s processing job JOB-9902
2024-03-15 02:14:08	ERROR	batch-processor	2	Retry failed for job JOB-9902

Filtered to ERROR level only. The batch-processor errors cluster at 2 AM - a batch job failure pattern visible at a glance now that the data is structured.

Going further

Once the data is structured, you can branch your analysis in several directions:

Use Pivot to count errors per service per hour
Use Extract Text again to pull IP addresses or email addresses from the message portion
Use Window Function with COUNT to calculate a rolling error rate
Export the structured version as a Parquet file for faster re-analysis later

The full pipeline

Extract Text - regex for timestamp
Extract Text - regex for log level
Extract Text - between delimiters for service name
Convert Type - timestamp_raw to TIMESTAMP
Extract Date - hour from timestamp
Top/Bottom - most common error types or services
Filter - log_level = ERROR

Seven steps to go from a blob of text to a structured, filterable log analysis table. No grep, no awk, no regex debugging in a terminal. The pipeline is visible, each step is editable, and the SQL is generated for you.

Analyze your log files →

Parsing and Analyzing Server Log Files

Step 1: Extract the timestamp

Step 2: Extract the log level

Step 3: Extract the service name

Step 4: Convert the timestamp

Step 5: Extract the hour for time-of-day analysis

Step 6: Find the most common entries

Step 7: Filter to errors only

Going further

The full pipeline

Try it yourself

Related tools

More from the blog

Text Extraction Patterns

Extract Email Domains from a Contact List

Date Operations Guide