¿ªÔÆÌåÓý

ctrl + shift + ? for shortcuts
© 2025 Groups.io

"Find or Create" functions: a discussion


 

Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

Apologies for brevity. I¡¯m on my phone.

I think it depends on how transactions are being managed. If called within a transactional scope, I¡¯d be happy with separate ¡°findPatient¡± that returns an optional patient, and a ¡°createPatient¡± that takes the minimal properties needed to record a new patient in the system.? That would allow the properties used to find a patient and the properties used to create a new patient to vary as the application ?requires.

But if transactions & concurrency control are being managed behind the API, they may _have_ to be combined into a single operation.?


On Sun, 24 Nov 2019 at 08:55, J. B. Rainsberger <jbrains762@...> wrote:
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::

--


 

Pragmatically, in the vast majority?of applications much less data is required for finding an entry than creating it.? Creating an entry with so much missing data can create data integrity problems.

So, given that findOrCreate() should need all the data that Create() would need (not just identifying information), I would find it cleaner to just allow Create to return the object whether it was created or already existed (and signal that the entry already existed if the client code cares).??

On Sun, Nov 24, 2019 at 1:55 AM J. B. Rainsberger <jbrains762@...> wrote:
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

¿ªÔÆÌåÓý

When developing web applications we should bear in mind that some people that do not want to have I.e. any citizenship, this would lead to dirty data


Mit Freundlichem Gr¨¹?en?

Ahmet Murati

Software developer & Translator

E-mail:?ahmet.murati@...
E-mail:?ahmetmurati@...
Handy:?+4915123015776


From: [email protected] <[email protected]> on behalf of J. B. Rainsberger <jbrains762@...>
Sent: Sunday, November 24, 2019 9:55:36 AM
To: [email protected] <[email protected]>
Subject: [testdrivendevelopment] "Find or Create" functions: a discussion
?
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

¿ªÔÆÌåÓý

In the case of identifying for medical reasons, it would be vital to distinguish the person: a "find or create" method would invite duplicate records, and loss (or at least unawareness) of other medical history.

On 24 Nov 2019, at 14:01, Steve Gordon <sgordonphd@...> wrote:

Pragmatically, in the vast majority?of applications much less data is required for finding an entry than creating it.? Creating an entry with so much missing data can create data integrity problems.

So, given that findOrCreate() should need all the data that Create() would need (not just identifying information), I would find it cleaner to just allow Create to return the object whether it was created or already existed (and signal that the entry already existed if the client code cares).??

On Sun, Nov 24, 2019 at 1:55 AM J. B. Rainsberger <jbrains762@...> wrote:
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

First thing first, it looks strange to look for something by giving the values it should fetch.
On the UI side I would usually have a get that fetches me data to get a state of my application. And at some time, I could decide to send a request to create something but I would do so explicitly.

To continue with your question, to me, it looks like the? findOrCreate? is doing one thing too much.

If you are experiencing performance?problems, it could be alright to do so. But in the long run, it would be best to split it into two actions as you stated.?
If you have high concurrency you might also think about these two actions in a transaction, but if the race is not a tight one, you might survive not having any transaction at all.

my 2 cts, have?a nice evening,

Yoann



Le?dim. 24 nov. 2019 ¨¤?15:01, Steve Gordon <sgordonphd@...> a ¨¦crit?:
Pragmatically, in the vast majority?of applications much less data is required for finding an entry than creating it.? Creating an entry with so much missing data can create data integrity problems.

So, given that findOrCreate() should need all the data that Create() would need (not just identifying information), I would find it cleaner to just allow Create to return the object whether it was created or already existed (and signal that the entry already existed if the client code cares).??

On Sun, Nov 24, 2019 at 1:55 AM J. B. Rainsberger <jbrains762@...> wrote:
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

I like to separate between the ideal and pragmatic designs.

If an idealized model says the record should already exist, but for pragmatic reasons we haven't created it yet, allocating on fetch is perfectly reasonable.? For example, initializing a friends list only when a an actual friend is being added sounds fine to me. Treating the Command part of a Query as a hidden implementation detail is OK.

However, I get more concerned when the value being created is more complicated. Partially initialized objects are scary.? Worse is if there might be business rules that prohibit certain values from being created. Having a get() throw IllegalValue error would be surprising.

I'm fine with the general pattern for trivial values but creating a patient record as a side effect feels a little off. Your one line example crosses my threshold of complexity but? I can change the nouns and be fine with it.

-- Jeff Bellegarde


On Sun, Nov 24, 2019 at 2:01 PM Steve Gordon <sgordonphd@...> wrote:
Pragmatically, in the vast majority?of applications much less data is required for finding an entry than creating it.? Creating an entry with so much missing data can create data integrity problems.

So, given that findOrCreate() should need all the data that Create() would need (not just identifying information), I would find it cleaner to just allow Create to return the object whether it was created or already existed (and signal that the entry already existed if the client code cares).??

On Sun, Nov 24, 2019 at 1:55 AM J. B. Rainsberger <jbrains762@...> wrote:
Hi, folks. An old issue came back to the surface this month in consultation with clients and I'd like your opinion. It regards the old "find or create" pattern. It seems to violate Command/Query Separation (as I understand it), but it seems handy and harmless, so I'd like to find out more about what you folks think about it. Benign? Problematic?

I imagine using this with the Repository pattern. Let's say we register a patient in a medical environment and so we need a UI that reduces as much as possible the number of steps. We don't want to force the user to look up a patient just to discover that the hospital has no record of them, so we allow the user to enter some basic identifying information. This information suffices to either find an existing patient or create a new one if our database doesn't know that patient. The result is something like

Patient registeredPatient = patientRepository.findOrCreate(patientIdentifyingInformation);

The identifying information might have basics like name, date of birth, it doesn't matter. We can guarantee that registeredPatient now represents an Entity in our system, either because we found someone that matched the identifying information or because we created one.

This appears to violate CQS, but it seems like a good thing to have. Some individuals struggle with this, because they don't know whether this is an area where CQS "doesn't matter" or an area where CQS is trying to teach them something and they can't see what they're meant to learn. I haven't thought about this in depth in years, so I feel the same way right now. Drawbacks? Alternatives?

I was also thinking about how to design this, and it seems to me like a special case of getOrAbsent(), so that I could implement the generic findOrCreate() algorithm with something like

repository.find(identifyingInformation).orElse(T::createFromIdentifyingInformation)

where find() returns Maybe TIdentifyingInformation and T has a named constructor for creating a T from a TIdentifyingInformation. I'm assuming here that TIdentifyingInformation is enough to provide all the mandatory properties of T.

With this design, I don't need a single findOrCreate() function any more, because the pieces find() and orElse(T::create) communicate the idea well enough.

Thoughts? I'm happy to see the discussion meander. Is this a totally-solved issue and there's one clear good way to proceed? or is it more a matter of context or preference?
--
J. B. (Joe) Rainsberger :: :: ::


 

I think there's an issue of naming and grouping here too.

What we are calling findOrCreate sits in a soup of possible API actions for this sort of task. Some other common options include:
  • strict get - fail if not present, get if present
  • lazy get - default if not present, get if present
  • strict create - create if not present, fail if present,
  • strict update - fail if not present, update if present
  • replace/put - create if not present, update if present,
  • ensure - create if not present, ignore if present
  • strict delete - fail if not present, remove if present
  • lazy delete - ignore if not present, remove if present
Some of the confusion in this area comes from looking at he needs of the system?though different "lenses". The CRUD view prioritises strict get, strict create, strict update, and strict delete. The REST view prioritises pure or lazy get, replace/put, and lazy delete.

One of the many things I find myself explaining much too often is that REST is not CRUD. Conflating the two worldviews leads to all sorts of confusion.

Frank.



Ken Mccormack
 

Hi All

10 years since I posted here?? Maybe! Hope everyone is well!

CQRS means that views are permitted to have less than absolute correctness, ie a scale of CAP, Brewer's theorem, etc.?

So, when doing an insert, the same considerations will apply - I guess the question is "When do you need to know the command was successful?Immediately, or later on?" ;)

The find operation is just another view, how correct it needs to be will depend on the workload.

If you're worried about it, those new fangled distributed hashing algorithms would probably increase the odds of success -

Ken




 

I have a system that is doing this part of the time.? It is fraught with problems when doing the search.? In your case, searching for "John Smith" may return a set of records, while the creation is creating a record which needs some unique identifier for the real John Smith you want to add to the database.? Even just using first name and last name , or partial of SSN , etc? makes for complicated code for what keys you are going to use , are those keys also using SOUNDEX() feature of SQL ?? For those who will follow you, just make it classical CRUD application without mixed or confusing methods or functions.?