Monday, December 16, 2019

Notes on Agile Retrospectives

I recently read (well, skimmed through) the book Agile Retrospectives - Making Good Teams Great by Esther Derby and Diana Larsen. The authors share lots of insight & experience on having retrospectives.

What is a retrospective? Quote from the book:

When we say retrospective, here's what we have in mind: a special meeting where the team gathers after completing an increment of work to inspect and adapt their methods and teamwork.

Some benefits of retrospectives the authors have heard:

Improved productivity
Improved capability
Improved quality
Increased capacity

Typical structure for a retrospective

(1) Set the stage
- Welcomes & appreciation of people's time
- Duration, goal & purpose of the retrospective
- If the team has working agreementa, post and review them.
(2) Gather data
- Ask "What?"
(3) Generate insights
- Ask "Why?"
- Think what to do differently.
(4) Decide what to do
- Plan experiments and actions
- Pick only the top items (for an iteration retro, 1-2 could be enough)
- Check out that people sign up and commit to the selected tasks.
(5) Close the retro
- Decide how to document what the team has learnt
- Plan for follow-up

Tailoring a retrospective for your team - Planning the retro

Knowing the context

When preparing the retro, aim to have answers to the questions:

What is the context of the team?
- If you're working with your own team, you probably already know the history & context of your team-
- If you're working with a team other than your own, study the context. (To get clues about what questions to ask and what challenges the team might have.)
What's the goal for the retro?
How long will the retro be?
Where will the retro be hold?
- A setup where everybody can see other people's faces is preferred.
- Whiteboards for post-its, walls for timelines, flip charts etc.
What's the structure of the retro?
- How much time for the different phases?
- If the retro is longer than two hours, remember to have break(s)

Selecting activities for the phases

Based on that you can select the activities for different phases.

The book has a good selection of activities for each phase.
The book author noted that ARCS criteria for evaluating instructional designs can be applied also for retro activities:
- Attention
- Relevance
- Confidence/Competence (activities that the people can complete successfully)
- Satisfaction
As a tip, you can choose alternative activities (longer & shorter) to help with managing the time.

Leading a retro

Facilitator vs. participant
- As a facilitator your primary responsibility is the process.
- Participants focus on the content, discuss and make the decisions.
Managing the activities
- As a tip, when using an activity for the first time, write a script for yourself.
- If multiple parts, give the details for each part.
- After giving instructions, ask for questions, pause and count to ten.
Two main tasks during the activity:
- Answer questions about the activity
- Monitor the room
Debriefing the activity, e.g. with the following:
- Ask "What did you see and hear?"
- Ask how people responded: "What surprised you? Where were you challenged?"
- Ask for insight & analysis
- Ask how could the insights be applied?
Related to group dynamics:
- Aim for everybody to participate. Make sure
  - People with something to say have the chance.
  - People with lots to say don't dominate.
- After participating, next most common issues are violating the working agreements & blaming.
If team become stuck, you can e.g. ask something like
- What have we tried before? What happened? What would you like to happen differently?
- If we had that, what would we gain?
- Have you ever tried this a different way? What happened?
Managing time
- Have some timepiece to time activities etc.
Managing you
- If needed, remember to take a deep breath.
- Take a break if needed
  - Shake out your hands & legs
  - Take three deep breaths to get oxygen to think clear

Activities for the different phases.

The authors have a good selection of activities for each phase. Please check out the book yourself.

In addition, here are some links to collections of retrospective activities etc.:

http://www.funretrospectives.com/ - "Activities and ideas for making agile retrospectives more engaging"
https://retromat.org - Generates a random plan with 1 activity for each phase
Retrospective Techniques for Coaches, Scrum Masters, and Other Facilitators

Wednesday, November 27, 2019

Notes on Lambdacast, Category Theory etc.

I recently finished listening through LambdaCast, "a podcast about functional programming for working developers". Some notes on statically typed functional programming, basic category theory etc. will follow. I concentrate here mainly on the topics I haven't been using so much -> thus skipping some of the topics, focusing mainly on the category theorish parts. The podcast described category theory as the "math of maths".

On notation

I use here mainly Elm/Haskell-ish notation.

Parametric types

More notes later, but I'll mainly use Elm/Haskell-ish notation for parametric types:

List a means a list of values of type a, e.g. List String or List Int.
In Java/C# these are written e.g. List<String> and in Scala as List[String]
More on this below under "Polymorphism & type parameters".

Function type signatures

f: A -> B -> C means a function f that takes two parameters of type A and B and returns a type C.
With partial application, this can be interpreted as a function that takes parameter of type A and returns function of type B -> C.
So f: A -> B -> C means the same as f: A -> (B -> C).

Some examples

List.length: List a -> Int - function that takes a list and returns it's length as integer.
List.filter: (a -> Bool) -> List a -> List a - function that takes a "predicate function" and list and returns a new filtered list

Note that Haskell has roughly same notation but with two colons, e.g. f :: A -> B -> C.

Applying function

Write the function name followed by parameters. Note that no parentheses are used (as in e.g. JavaScript/Java/C#). For example, if we have addition function add: Int -> Int -> Int, it would be applied as add 1 2, evaluating to 3.

Function composition

Function composition is mentioned a couple of times. I'll use Haskell's infix dot notation, described by an example:

Let's say we have the two functions:

string2int :: String -> Int
int2bool   :: Int -> Bool

We could chain the functions as (assuming we have a string str):

myBool = int2bool (string2int str)

This can be also written with function composition .:

myBool = (int2bool . string2int) str

Various terms

Domain and codomain

If there is a function fun: A -> B (that takes value of type A and returns a value of type B)

All possible input values (A) are called domain of the function
All possible result values (B) are called codomain of the function.

Note that domain here is different term than in domain-driven design.

Associative property

Associative property is a property of a binary operation. Roughly, for a binary operation ⊗ it means that (x ⊗ y) ⊗ z = x ⊗ (y ⊗ z). This holds true for example for addition, e.g. (2 + 3) + 4 = 2 + (3 + 4).

Types, categories and sets

These are somewhat related concepts and not totally identical. I'm going to cut quite many corners and mainly use the term type.

Morphisms (Episode 8)

Some corners cut, a morphism is a mapping/relationship from a type to another. The word comes from ancient Greek morphe (“form, shape”). In programming context, a function f: A -> B from type A to type B is a morphism.

Note that the morphism can have same source and target type. Endomorphism (from ancient Greek endon, “inner, internal”) is such a morphism. Such a function would look like fun: A -> A, for example. As a practical example, function inc: Int -> Int (that increases it's argument by 1) is an endomorphism with type A being Int. For a more mathematical description, see Endomorphism in Wikipedia.

Isomorphism (from ancient Greek ísos, “equal”) is a morphism that has an "inverse morphism". This can be also described as "no data is lost in the transformation". With types A and B that would mean that there are functions fun1: A -> B and fun2: B -> A such that fun1 . fun2 = identity (i.e. fun1 (fun2 x) = x). For a more mathematical description, see Isomorphism in Wikipedia.

Homomorphism (from ancient Greek homós, “same”) is a morphism that "preserves structure", e.g. List.map or List.filter For a more mathematical description, see Homomorphism in Wikipedia.

Catamorphism (from ancient Greek katá, “downwards”) is a morphism from an algebra to another. List.fold was mentioned as an example. For a more mathematical description, see Catamorphism in Wikipedia. Dual/opposite of catamorphism would be Anamorphism (from ancient Greek aná, “on, up, above, throughout”). Cata- & anamorphisms (as well as pylo- & paramorphisms) are discussed within FP context in the paper Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire (link in the end)

With morphisms, denotational design/algebra was also mentioned. That would concentrating on the verbs and nouns of an "algebra" that would be e.g. HTTP requests. This sounded somewhat related to DDD. The following presentation was recommended: Denotational Design: From Meanings To Programs.

Polymorphism & type parameters (Episode 9 & Episode 21)

The name polymorphism comes from ancient Greek polús (“many"), indicating something that can have multiple different shapes.

First, in some contexts, especially object-oriented programming, polymorphism means usually subtype polymorphism. A common example of this is class hierarchy of classes Animal, Cat extends Animal and Dog extends Animal where instances of Dog and Cat are also instances of Animal. This is quite a different concept than described here.

In functional programming context (with static types), polymorphism means usually parametric polymorphism. This means having types that have type parameters. Java/C# generics is quite the same concept, some corners cut. Typical example would be list that is parameterized to contain elements of certain type.

In e.g. Elm/Haskell this is written as List a, in Scala List[A] and in Java/C# List<T>. As a detail, in Haskell the type parameter is typically written in lower case whereas in Scala/Java/C# it's typically written in upper case.

Semigroups and monoids (Episode 12)

A semigroup is type with a binary associative operation ⊕ that combines values of the type, resulting to a value of the same type. This operation is often called append or concat. Function signature would be append: A -> A -> A.

Many semigroups have an "identity" element e such as e ⊕ x = x ⊕ e = x for every x in the semigroup. A semigroup with an identity element is called a monoid.

Some examples of monoids:

Integers under addition, 0 being the identity element
Natural integers under maximum operation, 0 being the identity element
Lists under append operation, empty list [] being the identity element
Strings under concat operation, empty string being the identity element

Algebraic data types (ADTs) (Episode 13)

Algebraic data type system is a type system, where types are derived from other types as product types & sum types.

Product types ("AND" types) are pretty much like typical struct/class types, a combination of values, e.g.

type Card = Card
    { suit : Suit
    , rank : Int
    }

The term product type comes from the idea, that all possible values of Card can be originated as multiplying all possible values of Suit with all possible values of Int.

Sum types ("OR" types) are types that have alternative "choices".

Following the playing card example, a very simple example would be Suit

type Suit = Diamond
          | Heart
          | Club
          | Spade

So far this is like enums in many programming languages. What makes sum types more powerful is the possibility to add data to the values. Our earlier Card example did not support jokers but let's redefine it with a sum type:

type CardV2 = StandardCard Suit Int
            | Joker

With this, a StandardCard value will consist of suit and rank, in addition to it's type. The term sum type comes from the idea, that all possible values of Card2 are possible values of StandardCard + Joker.

Functors (Episode 16)

Very informally, a functor is a "containerish structure" (polymorphic type f a) that has a map operation (in Haskell world fmap), with the signature map: (a -> b) -> f a -> f b.

A "mathematical" functor follows two basic rules:

Applying map with identity function keeps the functor as it is: map id f == f
Applying map is composable: map (fun1 . fun2) x == map fun1 (map fun2 x)

List & Maybe are typi<cal examples of functor.

It's good to note that in mathematics context, the term functor is used mainly for the function/mapping, whereas in FP context the term is used more to describe the type that has map/fmap operation.

Note that partial application for map with a function (a -> b) results in a function of type (f a -> f b) that works "inside the functor" - this is called lifting.

Applicative functors (Episode 17)

Applicative functor is an abstraction between functor and monad, being somewhat new addition in Haskell world.

For an applicative functor type, there are two main operations (in addition to functors map/fmap)

Function for wrapping a value inside applicative functor: pure: a -> f a
apply: f (a -> b) -> f a -> f b
- Note that in Haskell apply is named <*>

apply might feel a bit strange at first but with partial application that gives a possibility to apply a binary function to values inside two applicative functors.

As a simple example, we have two Maybes containing ints and we want to add them up (resulting up to a Maybe)

We can map Maybe Int to Maybe (Int -> Int) (e.g. having Just 5 would result as a function of one argument that increases it's argument by 5)
The Maybe (Int -> Int) can be given to apply with another Maybe, resulting as Maybe with sum.

Monads (Episode 18)

Monad adds flatMap on top of applicative functor. Type signature is flatMap: m a -> (a -> m b) -> m b. In Haskell context, flatMap is named >>= and often called "bind".

This kind of "forces" the operations to be sequential
Haskell has "do notation" that makes it more convenient to work with monads.

Monads are not so complex topic to get grasp as they might sound. They have many applications

Has lots of use with lists & maybes/optionals
Used often with futures and promises so that the result from previous async operation is used to initiate the next one
- E.g. first do a HTTP request to fetch users and after that another to get some extra resources for the users.
Haskell as IO monad stating explicitly that something has side effects / state

Monday, November 11, 2019

Notes on Two Systems - first part of Thinking, Fast and Slow

Notes on first part of Thinking, Fast and Slow (Two Systems)

The characters of the story

Two modes of thinking: System 1 (Fast thinking / automatic system) and System 2 (Slow thinking / effortful system). These terms were originally proposed by psychologists Stanovich & West
- System 1: Operates automatically and fast, with no or little effort, no sense of voluntary control
- System 2: Conscious, explicit, effortful thinking, concentration.
System 2 has some ability to change the way System 1 works.
System 2 can influence where the attention is allocated (see e.g. "The invisible Gorilla")
System 1 works automatically, System 2 is usually in a comfortable low-effort mode
System 1 can "call System 2" when e.g. a question arises for which System 1 doesn't offer an answer
Illusions (both visual and cognitive)
- e.g. Müller-Lyer illusion
- System 2 can choose to believe that the lines are of equal length but it can't (at least without lots of training) prevent System 1 from seeing the lines to be of different length.
NOTE: System 1 and System 2 are fictious characters

Attention and effort

System 2's operations are effortful.
as a not: Size of the pupils indicate mental effort
If System 2 is under load, attention to other tasks (also of System 1) is reduced.
- "... people, when engaged in a mental sprint, may become effectively blind" (e.g. "The invisible Gorilla")
Switching from one task to another is effortful.
Time pressure is another driver of effort

The lazy controller

For walking, there is usually "a natural speed" with which no strain or "need to push" is experienced
- Also System 2 has a "natural speed"
System 2 follows "the law of least effort"
Maintaining a coherent train of thoughts requires discipline
Note: "flow" (studied by Mihaly Csikszentmihalyi)
- People describe flow as "a state of effortless concentration so deep that they lose their sense of time, of themselves, of their problems"
A well-established proposition: Both self-control and cognitive work are forms of mental work (by System 2)
- They both draw at least partly on a shared pool of mental energy (studied by Roy Baumeister)
Ego depletion
- Effort of will or self-control is tiring.
- If you have had to force yourself to do something, you are less willing or less able to exert self-control when the next challenge comes.
- Study: Tired and/or hungry judges tend to fall back on easier solutions.
One of the main functions of System 2 is to monitor and control thoughts and actions "suggested" by System 1.
Recurrent theme of the book: Many people are overconfident, prone to place too much faith on their intuitions. (cognitive effort with System 2 is at least mildly unpleasant)
Stanovich & West have studied: What makes some people more susceptible than others to biases of judgement?
- Stanovich draws a sharp distinction between two parts of System 2:
  - "algorithmic mind" (slow thinking and demanding computation) and "rational mind" ("engaged", more skeptical to their intuitions)
  - rationality should be distinguished from intelligence

The associative machine

System 1 does associations automatically - thoughts, memories, even physical reactions
Priming effect
- Example study: exposure to a word causes immediate and measurable changes in the ease with which many related words can be evoked.
- E.g. primed with "EAT", SO_P becomes more easily SOUP but primed with "WASH", SO_P becomes more easily SOAP
- Priming is not restricted to words
- Example, Florida effect
  - Priming with words related to aging caused subjects to walk slower
Priming is reciprocal (works in both ways), some examples:
- Walking fast primed for different things/words than walking slow
- Another examples of reciprocal link: Being amused makes you smile <-> Smiling makes you be more amused
- Nodding up-down vs. shaking head side-to-side has effect on acceptance/rejection of a message
Primes quide out judgments and choices quite much
- Example study of voting patterns for school funding - whether the voting station was in school or just near had effect on the results
- Another example study of money priming individualism
- "Lady MacBeth effect"
- System 2 believes that it is in charge but System 1 has also effect without us even noticing.
System 1 provides impressions that often turn into your beliefs, and is the source of the impulses that often become your choices and your actions.

Cognitive ease

Cognitive ease / strain
Causes (of cognitive ease) for example: Related experience, clear display, primed idea, good mood
Consequences: Feels familiar, feels true, feels good, feels effortless
Cognitive ease has multiple causes and they have quite interchangeable effects - it is difficult to tease them apart
Illusions of remembering (example of made-up celebrity names)
- E.g. For a new word, making it easier to see/read -> it will be more likely to have the quality of pastness.
Illusions of truth
- Frequent repetition makes people to believe falsehoods, familiarity is not easily distinguished from truth.
How to make a persuasive message
- Anything you can do to reduce cognitive strain helps: Legibility, simple language, memorable message (rhyming), name easy to pronounce
Strain and effort:
- Reciprocity: Cognitive strain will activate System 1.
- Study with Shane Frederick's Cognitive Reflection Test: Performance was better with bad font. (Difficulty to read activates System 2)
The pleasure of cognitive ease
- Mind at ease puts a smile on the face
- Mere exposure effect
Ease, mood and intuition
- Intuition works better when we are on a good mood

Norms, surprises, and causes

The main function of System 1 is to maintain and update a model of your personal world, which represents what is normal in it.
A capacity for surprises is an essential aspect of our mental life.
Norm theory
Our mind is eager to see causes and intentions.

A machine for jumping to conclusions

System 1 does choices and conclusions automatically and does not keep track of alternatives, or even of the fact that there were alternatives.
Daniel Gilbert: For a statement, initial attempt to believe is an automatic operation of System 1. Gilber sees unbelieving as an operation of System 2.
Confirmation bias
- When asked "Is Sam friendly?" different instances of Sam's behaviour will come to mind than would if asked "Is Sam unfriendly"
- Positive test strategy
Halo effect
- The tendency to like (or dislike) everything about a person - including things you have not observed.
- The sequence in which we observe characteristics of a person matters how we view that person.
To derive the most useful information from multiple sources of evidence, one should try to make these sources independent of each other. (decorrelate error)
What you see is all there is (WYSIATI)
- Associative memory represents only activated ideas.
- Intuitive thinking is often jumping to conclusions on the basis of limited evidence.

How judgments happen

System 2 receives or generates questions - in either case it directs attention and searches memory to find answers
System 1 continuously monitors what is going inside and outside the mind and continuously generates "basic assessments" of various aspects of the situation.
- How things are? Is there a thread or opportunity? ...
- Example of basic assessment - discriminate friend from a woe at a glance.
- E.g. Todorov's voting study
Sets and prototypes - System 1 deals well with averages but poorly with sums.
Intensity matching - System 1's scale of intensity allows matching across diverse dimensions.
- Example: "Julie read fluently when she was 4 years old." -> "How tall is a man who is as tall as Julie was precocious?"
Mental shotgun
- System 1 carries many computations at any one time + automatically. Other computations are voluntary.
- The control over intended computations is far from precise - we often compute much more than we need. (This is called mental shotgun)

Answering an easier question.

Normally we have intuitive feelings and opinions about almost everything that comes onto our way.
Question substitution:
- The target question is the assessment we intend to produce.
- The heuristic questions are the simpler questions we answer instead.
The mental shotgun makes it easy to generate quick answers to difficult questions without imposing much hard work on the lazy System 2.
Intensity matching helps to fit the answers to the original questions
Affect heuristics - in which people let their likes and dislikes to determine their view of the world (Paul Slovic)

Thursday, November 7, 2019

Scott Wlaschin - Domain Modeling Made Functional

Notes for Scott Wlaschin book Domain Modeling Made Functional - Tackle Software Complexity with Domain-Driven Design and F#.

Previous DDD books I've read: Domain Driven Design (Eric Evans) & Implementing Domain-Driven Design

As a summary I enjoyed this book. The writing was pretty clear and compact and made me consider F# as an interesting option for building software.

Rough structure of the book

First part of the book was introducing DDD & it's main concepts.

Second part went through types & functions especially in F# context, how to use types to model domain and how to model workflows as pipelines.

Third part discussed first how to implement the domain model, types and pipelines. After that implementation related concepts were discussed: error handling, serialization and persistence.

Fine-grained type system & DDD

F# kind of type system seems to make it quite cheap/easy to introduce lots of types which allows to represent domain on a quite fine-grained way.

Some question marks for me:

How difficult it might get with naming if there are very much small types?
How much boilerplate code it might require to transfer e.g. an order through a pipeline of types like UnvalidatedOrder, ValidatedOrder, PricedOrder & PricedOrderWithShippingMethod etc.

It would be interesting to try.

Summary of things & practices discussed

DDD

Aim to develop a deep shared understanding of the domain
Partition the solution space into autonomous, decoupled bounded contexts

DDD with F# kind of language

Before implementation, aim to capture the requirements with a type-based notation with both the nouns and the verbs
- Nouns will usually be represented by an algebraic type system
- Verbs will usually be represented by functions
Aim to capture business rules & constraints in the type system whenever possible.
- "Make illegal states unpresentable"
Aim to design functions pure and total

Summary of FP techniques

Compose workflows from smaller functions
Parameterize functions when there's a dependency
Bake dependencies into a function using partial application
- Allows to compose functions more easily & hide implementation details
Use special functions transforming functions into needed shapes
Solve type mismatch problems by lifting types into a common type

Friday, November 1, 2019

SQL Performance Explained

This time I read SQL Performance Explained by Markus Winand. The main theme of the book is database indexing that is probably the most important thing developers should know on SQL database performance.

Note that there is "free web-edition" of the book Use the index, Luke - the web edition has sections matching the book sections (linked to below).

Anyway, I warmly recommend to buy the ebook if you're interested on the subject - It's has a pretty affordable price and has a good bunch of knowledge in pretty small package.

1 - Anatomy of an index

Database index is redundant structure referring to the actual information that is stored usually in a different place.
Two main data structures:
- Balanced search tree (B-tree)
- Doubly linked list
Balanced search tree contains leaf nodes.
- Each leaf node is stored in a database block (smallest storage unit).
- Each leaf node has index entries sorted.
- Index entries contain the indexed columns and a reference to the "physical" table row.
There is also a doubly linked list connecting the leaf nodes.
Index lookup has three steps
- Tree traversal
- Following the leaf node chain
- Fetching the table data from the "physical" table

2 - The Where Clause

Execution plan shows how the database actually executes an SQL statement. See DB-specific instructions
- Created by query optimizer / query planner - usually "cost-based optimizer"
- Optimization is usually based on statistics
  - Table size (both in rows and blocks)
  - Column level statistics - number of distinct values, range (smallest/largest values), NULL occurrences etc.
  - Index statistics - mainly tree depth
Author calls B-Tree traversal first power of indexing.
Concatenated indexes
- Index across multiple columns
- Note that the order of columns matters - the first column is always usable
Full table scan
- Going through the whole table
- Note that in some cases that can be the most efficient operation
  - DB might be able to read larger chunks at a time
  - No need for "random access"
- Note: It is important how many rows are got from the index for filtering
Function-based indexing
- Summa summarum has some details to be aware of!
- By default aim to index the original data - usually that is the most useful info
Parameterized queries
- Queries with bind parameters (also referred to as dynamic parameters or bind variables)
- DBs with execution plan cache can reuse execution plan when using bind parameters
- When using bind parameters, the optimizer has no concrete values to determine their frequency -> will always select the same execution plan
  - There are not too many cases where that affects the execution plan
Searching for ranges
- Can utilize indexes
- As a basic rule, keep the scanned range as small as possible.
- As a rule of thumb, index first for equality and then for ranges.
LIKE expressions
- can only use the characters before the first wildcard.
  - -> Avoid LIKE expressions with leading wildcard
- Be aware that with bind parameters the DB does not know about the leading wildcard
  - Work-around append hack: Instead of (foo LIKE ?) use (foo || '' LIKE ?). (Use only as a last resort)
Index merge
- In most cases one index with multiple columns is better than multiple single-column indexes
- Exception: Queries with multiple range conditions (E.g. FOO < ? AND BAR < ?)
- Methods for combining indexes
  - Index join
  - Bitmap indexes (DWH, mainly not suitable for OLTP)
Partial indexes
- Index only part of the rows
  - E.g. CREATE INDEXES foo_bar ON foo (bar) WHERE baz = 123
- Very common in queuing systems where most of processing is for unprocessed entries.
Oracle NULL has some surprises, e.g.:
- Oracle treats an empty string quite much as NULL
- As a gotcha: Oracle does not include rows in an index if all indexed columns are NULL.
- Can be used to emulate partial indexes.
Obfuscating conditions - where clause patterns that prevent index usage
- Watch out indexing with DATE types - Summa summarum: Write queries for continuous periods as explicit range conditions.
- Numeric strings - Summa summarum: Use numeric types for numbers.
- Combining columns - Summa summarum: When combining multiple columns in a range condition, use a redundant condition.
- ...

3 - Performance and Scalability

Environmental parameters that affect performance
- Data volume
- System load
- Hardware
Scalability - Impacts of data volume
- Indexing
- Pay attention to predicate information of the query plans
- Pay attention to filter predicates
Scalability - Impacts of system load
- Bigger hardware is not always faster but it can usually handle more load.
  - Same with scaling horizontally (more servers)

4 - The Join Operation

Three approaches:

Nested loops - Kind of server-side N+1 selects
Hash join - Query candidate records from another table into a hash table
- This is the usually used one.
Sort merge - Kind of a zipper
- Works well if the inputs are already sorted

5 - Clustering Data

"Clustering" here: Storing data that is accessed together close to each other to make data access require fewer IO operations.
The author refers to clustering as the second power of indexing.
Index predicates
- Having a filtered value in the index can reduce table access.
- Table access is not so big deal if accessed rows are stored in same table block. (All rows will be read together in one read operation)
- Index clustering factor - Correlation between order of row entries in the index and order of rows in the table.
- As a tip, don't introduce a new index for filter predicates but extend an existing index.
Index-only scan
- Performance difference depends on the number of accessed rows and index clustering factor.
As a special case, some databases support index-organized tables
- That is, a B-tree without a heap table
- Note that secondary indexes come very inefficient to use.

6 - Sorting and Grouping

Indexed ORDER BY execution
- Saves sorting effort
- Allows pipelined execution - returning first results without processing the all data.
- -> The author calls pipelined order by third power of indexing
Remember that databases can read indexes in both directions.
GROUP BY - two different approaches
- Hash approach - Use a temporary hash table
- Sort/group approach - Sort the data first by the grouping key
  - Can use index to avoid sorting -> enables pipelined GROUP BY

7 - Partial Results

E.g. querying for top N rows.
Pipelined ORDER BY is very powerful to optimize these.
- -> DB can optimize for partial result only if it knows it when planning the query
- -> Inform the DB when you don't need all rows.
Paging
- Deterministic sort order is needed
- Two approaches
  - Offset method - Number of rows from the beginning
    - Vulnerable to drifting
    - Gets slower when going further.
  - Seek method - Search for last entry of previous page

8 - Insert, Delete and Update

Avoid redundant indexes whenever possible
Note that the first index makes the greatest difference
With INSERT & DELETE, each index adds time spent.
With UPDATE - indexes on columns that are not updated don't generally add so much time.

Bonus

The author of the book has also a nice website modern-sql.com on interesting newer SQL features.

Thursday, May 9, 2019

Designing Data-Intensive Applications

After some pauses in the middle, I finally finished Martin Kleppmann's book Designing Data-Intensive Applications.

In short, it's a great book on different aspects of data: storing data, transferring data, processing data, distributed systems etc. etc.
The writer was focusing on concepts rather than details of individual tools. Also he had a nice balance of theory and practise.

Each chapter has nice old-school map as TOC. For the total TOC, check out the cool poster.