Log Search - Overview — SIOBHAN FISHER

Log Search - Overview

Project: Redesigning and adding to Log Search, a core functionality of the Rapid7 InsightIDR product

Elements: User Research | IA | Interaction Design | Prototyping | Strategy

Where: Rapid7

Summary

What is InsightIDR’s Log Search?

Log Search is a core functionality within Rapid7’s InsightIDR product that gives specialist cybersecurity analysts visibility into activity within their network so they can detect and respond quickly to potential cyberattacks. It gathers vast amounts of data in one place and allows analysts to search and visualize that data.

Log Search is a multipurpose tool that has to serve many different use cases, for example:

An alert has fired and the analyst needs more information to determine if it is real,
An attack has happened and the analyst needs to understand what has actually happened and how bad it is,
Proactive searching for threats.

In other words, Log Search is the source of truth that helps cybersecurity analysts find evil.

Why a redesign?

Performance, performance, performance

The redesign of Log Search started as an engineering-led project to address front-end performance issues that had been a source of customer complaints. When there is front end latency, it takes longer to build a search and review the results, meaning it takes more time to find evil. The latency was worst in environments with a lot of data and given that the trend is towards more and more data, this problem would only get worse with time.

Reusability

To truly help analysts find evil faster, Log Search should be an integrated functionality that is available to analysts when they need it across the Rapid7 platform, not just a page that analysts come to and manually search. To make this goal more achievable, the engineering solution to address performance issues also facilitated easy re-use of Log Search components across the platform.

Usability

The engineering solution to the performance issues required re-building the entire Log Search page. The initial proposal was to essentially do a “lift and shift” of legacy Log Search. Users love legacy Log Search, and there is a lot to love, nonetheless we had a large collection of suggested improvements based on customer and internal feedback, and a basic heuristic analysis revealed widespread usability issues. I was not going to pass up the perfect opportunity to make improvements. With support from product management, I transformed the “lift and shift” into a redesign focusing on analyst workflows and addressing usability issues.

My Role

I was lead designer for Log Search. I partnered with product management to shape the scope of work, define problems, and prioritize features. I personally completed the design work shown in my portfolio. I also scoped out individual projects for a junior designer whom I mentored. I collaborated closely with user research, carrying out some research myself and other times assisting the research team in planning. I drafted all UI text and then supported the project content designer as they wrote the finalized text. I used design system components as much as possible and solicited input from the design system team where needed.

Guiding Principles

More information can be found in the Guiding Principles section and using the specific links below.

Balance respect for learned patterns and the need to make improvements
Consider the primary goals in every design decision

Improve performance
Make reusable components
Improve usability

Ensure cross-product consistency
Right size effort for maximum impact

The Process

More information can be found in the Process section and using the specific links below.

The redesign
Iterating on Feedback
Integration outside the Log Search page

Continuous research

The Solution

We created an early access Log Search page that existed alongside the primary Log Search page until it was fully featured enough to become the primary Log Search page. The old Log Search page was then renamed “Legacy Log Search” and will continue to exist until we can deprecate the legacy page.

The new Log Search page intentionally has a very similar structure to legacy to not disorient existing users, but it also contains many usability enhancements.

New

Image of new Log Search showing query bar, timeline, and a JSON formatted log entry.

Legacy

It allows quick and confident query building and pivoting from results with reduced user errors. It has a clear and consistent information architecture built around analyst workflows, making actions easier to find and results easier and quicker to review and interpret more confidently, allowing analysts to find and stop evil more quickly.

The Product

How does the redesigned Log Search help analysts find threats? I will walk through an example using dummy data and images from the released product, which is available as a free 30 day trial at Rapid7.com.

If cybercriminals have gained access to systems that they shouldn’t have, looking at “Active Directory Admin Activity” in Log Search can contain tell-tale signs. Inexperienced analysts may not know what keys they need to search to find suspicious activity. An easy way to see how the data is structured is to run an empty search for a short amount of time that will quickly return results.

Now that they are looking at data, they can use the context menu to build a query with no typing. A context menu existed only on values in legacy Log Search. In the redesign, we enhanced it and added it consistently across data types including values, keys, time, and bar chart bars. Running a “groupby” on the “action” key will sort the data into the different actions that have been taken by admins and display the results on a bar chart. Because the analyst is now looking for something more specific and wants to see activity over time, they should also expand the search out to 7 days.

Now they can see the different actions, they notice there have been a few privilege escalations. If an attacker has gained access to a compromised user account they can use privilege escalation to grant themselves access to, for example, files with sensitive financial data. Clicking on a chart bar triggers the context menu and allows quick pivoting to the underlying data. In this example, the analyst pivots to looking at all the privilege escalations for the last 7 days.

To spot the suspicious amongst the legitimate activity, they look at who has been taking these actions. Again this can be done quickly with no typing using the convenient context menu. The analyst looks first at “source_user” to see who has been taking actions. Then they select “target_user” to set up a two layer chart to quickly see whose privilege has been escalated.

The second bar on this chart is highly suspicious. Changing privileges is a perfectly legitimate activity for “adm_mstewart”, however Jimi Hendrix has no reason to be doing this. Since in the previous step the analyst set up a multilayer chart, they can use the context menu to drill in and in one click see that Bill Lumbergh’s privilege has been escalated. Bill Lumbergh has no business having admin credentials, so at this point it is highly likely we are dealing with compromised accounts and an attacker.

Without any further searches or context switches, the analyst already knows who to reach out to validate if their accounts have been compromised. The analyst can also take action to block these users, and therefore the attacker, limiting any potential damage. But it doesn’t stop there. Inexperienced and overwhelmed analysts can fall prey to addressing the immediate threat and assuming it stops there. Log Search has all the data so that the analyst can see what else these compromised accounts have been up to, if other accounts have been compromised, and what files and systems have been accessed. They can be confident they have actually stop the attacker and understand the full extent of the attack.

The analyst can use what they have learned in this search to streamline their and their colleagues future work by saving the chart they created to a dashboard. In the redesign we made this possible directly from Log Search. From the “Sharing Actions” menu select “Save as Dashboard Card” and simply choose the dashboard you want to save to and give the card a name. Now you will be able to quickly see all the escalations in your environment across the time range on your dashboard.

In this example, the redesigned Log Search allowed an analyst to quickly identify suspicious activity and streamline their future workflow by saving the chart they created directly to a dashboard. The more time an attacker spends in an environment the more damage they can do so finding and stopping them quickly is crucial.

Guiding Principles

As mentioned in the summary, the four guiding principles for this project were:

Balance respect for learned patterns and the need to make improvements
Consider the primary goals in every design decision
Ensure cross-product consistency
Right size effort for maximum impact

1) Balance respect for learned patterns and the need to make improvements

When you are redesigning a tool to help users in a high stress job, where time can mean the difference between catching an attacker before they get access to sensitive information or not, you have to be very intentional about what you change.

You have to consider existing interaction and information architecture patterns to:

Not disorient and slow down existing users,
Not break existing workflows, especially when users have put considerable time and effort into developing specific workflows to deal with the vast quantity of different known cyber attacks, and
Ensure that you don’t lose what users like about the current product.

2) Consider the primary goals in every design decision

The primary goals of the Log Search redesign were:

Improve performance,
Make reusable components, and
Improve usability

As a UX designer it is easy just to focus on usability. However, if we didn’t improve performance and make a reusable experience, this project would have been a failure, and I would have lost all influence capital to push for usability enhancements.

Improve performance

We generally think of performance as an engineering concern, but reducing performance related complaints requires improving two things:

Performance, and
Perception of performance.

Design drove improvements to both of these by

Adding features that help users create efficient searches that put less strain on the system.
Balancing potential usability improvements with performance strain on the system. For example, whether to display information that may help a user in understanding their data but requires a supplemental back end operation that will slow down loading the entire screen.
Considering perception of performance in all design decisions. For example, including a search percentage on long running searches improves user confidence and perception of performance. However, showing this on searches that load quickly when technical constraints mean the progress percentages on these searches are misleading may make the user perceive searches as slower than they actually are.

Make reusable components

In enterprise work we are not designing stand alone web pages or single purpose apps, we are designing elements of incredibly complex ecosystems. Reusability is not a nice to have, it is a necessity. The implication of this when making design decisions is that we are never solving for just one use case, we have to consider everywhere an individual component is currently used and is likely to be used in the future.

For Log Search specifically, components were already used across the product even before the redesign. For example, the Log Search query bar is used in the Log Search main page, the in context Log Search slide out, for creating dashboard cards, and for writing alerts. So every decision about the query bar had to work in all these contexts.

Further, our ultimate goal is to make Log Search an integrated component across the platform. This means not just thinking of Log Search as a page, but as individual components that could be taken out and used in different contexts. For example, if we want to use the results display to show specific log lines that have been added to an Investigation, the user should not need to write a query to find these loglines, so there is no need to have a query bar. How might we ensure that the results section makes sense as a standalone component?

Improve usability

Based on the usability issues we had observed in legacy Log Search, we focused on four key principles to ensure that we didn’t recreate the mistakes of the past.

Consistent interaction behavior
Minimizing loss of work through accidental query runs
Clear and consistent information architecture
Communicating defaults for more reliable data interpretation

3) Ensure cross-product consistency

Cross-product consistency is a win-win because it:

Improves the user experience with predictable interaction patterns that reduce cognitive load, and
Reduces engineering effort with reused components that are quicker to implement and easier to maintain than individual bespoke components.

There are two ways in which we strived for cross-product consistency:

Using design system components wherever possible. Sometimes this meant improving upon those components, which takes more time than just building a bespoke component, but ultimately saves time across the platform as those improvements are then available for other teams and, as previously mentioned, reused components are easier to maintain.
Seeking out and reusing patterns from other areas of the product. Sometimes what we needed didn’t exist in the design system (yet…) but had already been built for other areas of the product. In these cases we worked with the appropriate teams to componentize the existing code and then use those components in our project.

4) Right-size effort for maximum impact

In enterprise UX we rarely have the time or resources to do the full UX process on every project, so we have to focus our efforts on where design work will have the biggest return on investment. Figuring this out requires balancing the level of confidence in the design decisions with the customer value of the work and the potential impact of design.

Confidence

The more confident you are in the design decision, the less design time you need to dedicate. This can mean, for example, skipping the research validation and iteration cycles. The irony of this is that the way to get a high level of confidence in design decisions is precisely through having a clear understanding of the customer problem, which comes from doing research, and through validating and iterating upon designs. That said there are some circumstances where we can be relatively confident on decisions without doing all of that. For example, if a different area in the platform solves a very similar problem and has already been validated, or if we are using a well established UX pattern.

Customer Impact

Conversely, the higher the customer impact the more design time should be dedicated to a feature or project. Being able to put a lot of effort into finding the right solution through research, validation and iteration is what UX designers live for. The flip side of this, where you only put minimal effort into a low priority project, is incredibly difficult. We want to solve every problem to the best of our abilities and to be OK with a just OK design solution on a low priority project requires clear prioritization and self control.

There will always be more design work than the design team can reasonably do and ultimately the customer is best served by designers putting our efforts into projects that have a high customer impact.For example: you could spend three months coming up with a fantastic solution for an import but infrequently used feature, such as settings, and have no time left to work on anything else. Or you could time box the work on settings to two weeks, produce an acceptable solution, and have two and half months left to focus on doing really great work on features that analysts in their day to day workflow. Even though the settings experience won’t be as great, ultimately the overall user experience is better served by putting the time into the regular use features that have a higher customer value.

Design Impact

The final factor is design impact. Not everything can be significantly improved by design and with limited resources we have to focus where our particular set of skills have the most impact. For example, there is only so much we can improve upon a simple form that follows design system patterns. Whereas a complex workflow has lots of opportunity for unnecessary friction and can be hugely improved by design input.

Pulling it all together

The appropriate level of confidence, customer value and design impact must all be present when prioritizing design effort. In the hypothetical (and non-existent) circumstance where you are 100% confident in your design decision, the customer value and design impact don’t matter, you shouldn’t put effort into validating and iterating on a no-brainer decision. Similarly, if, for example, you have a daily-use simple form, even though this is high customer value it would represent a low return on investment for design effort because of the low design impact.

But what about time?

In the real world the biggest determining factor of how much design effort we can put in and therefore, the confidence we can have in the final design, is the time we are given and it shouldn’t be. Hearing that engineering is blocked because they are waiting on design, is the stuff of designer nightmares, especially when management is clamoring to see the feature finished.

But the amount of time that goes into design should be determined by confidence, customer value and design impact. Period.

Design can only have a minimal impact if we are not given the time to research, validate, and iterate. Across the organization, it doesn’t save any time to shortcut design. It just means validating decisions in code, and code is more time consuming and therefore expensive to iterate upon than prototypes.

As designers, we need to be able to estimate and articulate the effort of a given project and fight for the time it needs.

Getting more design time also doesn’t automatically mean slowing down engineering, it can just mean a frank conversation about what level of design fidelity is really needed for engineering to move forward. For example, if engineers are going to spend the first month of work wiring up the backend, they don’t need pixel perfect designs to start this, just a confident enough idea of the functionality. Which could be communicated in wireframes, workflow diagrams, or even a conversation. Then you can use the month before they touch the UI to get those pixel perfect designs ready.

If you really can’t organize for the time you need to do the appropriate level of design work, then you can do your best with lower effort methods like competitive analysis and only using existing components and patterns. If you do this, make sure you are communicating very clearly about the relative lack of confidence in the design decisions.

Engineering effort

Right-sizing is not just about design effort, it is also about choosing solutions that represent an appropriate level of engineering effort according to customer value. This involves working closely with engineers to understand the level of effort of different solutions. And when unexpected challenges come up this means talking through how we can still solve the customer problem without incurring undue engineering effort.

The Process

Cross-functional scrum team

Throughout this project I worked with a cross-functional scrum team made up of engineering, product management, and UX design and content. I attended both stand up and team meetings. Further, during the first phase of the project I set up a cross-functional team specific weekly sync where both design and engineering would show what they were working on. This allowed us to spot any misunderstandings or issues before they found their way into the product and while they were much easier to fix. I got to know the engineers individually and was able to tailor designs to right-size the design effort required for clear communication with that engineer for that specific feature. Some engineers were comfortable with wireframes and a conversation when using standard components. Other engineers and features needed fully annotated high-fidelity designs, especially for custom components.

Scope and prioritize the work

We started the project with a kick off workshop with the whole extended cross functional team. We gathered the collective knowledge of what was good and could be improved in legacy Log Search. We discussed what we thought should be included in the redesign and how it should be broken down into milestones.

Throughout the entire project I partnered closely with the product manager to continue to scope and prioritize work, ensuring that it met the plan we had laid out as a team and/or adjusting the features or plan as needed. Feature scope and priorities were based on user feedback and were discussed and validated with the cross functional team and internal subject matter experts.

The work done on new Log Search at the time of writing can be broken down into three major chunks:

The redesign
Iterating on Feedback
Integration outside the Log Search page

1) The redesign

The redesign part of the project consisted of everything that got us to the point where new Log Search became “the” Log Search and old Log Search became “Legacy” Log Search. This was broken down into three milestones:

  
    Milestone
    Theme
    End Goal
  
    1) Proof of Concept
    Searching
    Limited preview access
  
    2) Minimum Viable Product
    Understand and iterate on searches
    Open preview access for all
  
    3) Minimum Lovable Product
    Streamlining workflows
    General availability / Default to new Log Search

Milestone	Theme	End Goal
1) Proof of Concept	Searching	Limited preview access
2) Minimum Viable Product	Understand and iterate on searches	Open preview access for all
3) Minimum Lovable Product	Streamlining workflows	General availability / Default to new Log Search

2) Iterating on feedback

Throughout the redesign we collected feedback from customers and internal users and customer facing colleagues. We collated and prioritized this into a feature set that was focused on display enhancements and common workflow pivots.

3) Integration outside of the Log Search page

To move closer towards the goal of making Log Search an integrated tool not just a stand alone page we completed work that updated the existing slide out version of Log Search in the Investigations page of InsightIDR, and added a version of this that takes users directly to the log lines that triggered a given alert within an Investigation. We heard from some customers that this latter enhancement cut the time to investigate certain alerts in half.

Continuous research

We were continuously carrying out generative research and validation throughout the whole project. Research efforts were similarly right-sized according to confidence, customer value, and research impact.

We used a wide variety of research techniques including:

Internal subject matter experts feedback group

I set up a group of carefully chosen internal subject matter experts that included customer facing colleagues such as sales engineers and internal users such as managed detection and response analysts. Every two weeks Log Search design and product management would present anything from ideas to designs to get feedback from the group. This was an invaluable lightweight gut check for low customer value features, and an indicator if further research was needed for higher customer value features. The groups also had a companion Slack channel for asynchronous feedback

Competitive analysis
Usability testing (internal and customer)
User interviews (customer)
Surveys
Freeform user feedback
Pendo usage statistics