Telephone: +44 7973 504232


Taking a different approach to Systems Architecture

 


In this paper I look at a major issue for those responsible for data and IT in small and medium size enterprises: how do you stitch together all those bits of systems and their data?


Executive Summary


  • One super system that supports all our company's requirements is difficult to update in a dynamic world.

  • A portfolio of component systems enables us to treat those components in different ways, and to be more responsive, but inevitably leads to our data being held in a number of silos, and this will give us some problems with our data.

  • "Shadow" and departmental Software as a Service systems make it hard to have a clear idea of the data and functionality within a company.

  • IT can be seen as an investment, and like physical assets this will require resources for on-going maintenance.

  • Top tips on Managing the Morass


    1.       Tolerate Ambiguity: Don't over-analyse. Accept sub-optimal solutions when people and their behaviour are most likely limiting factors. Plan fluidly for a world in flux.


    2.       Be Proactive: Get out there, see what is happening, and constantly sell a better world of shared company data.


    3.       Interface and Reconcile: Build the two together, along with ownership and responsibility.


    4.       Focus on added value: Don't rush to install new systems in the hope of solving data problems.


    5.       Fix the broken bits first: the systems or data problems that fall in the not urgent, but important category. In other words, those that have been on the to do/known list for ever. If you do you will get happy staff, happier customers and maybe even become a bit more profitable.


    Silos


    One ideal target is that all our company data is held in one common database, which may be served by one or more applications. Data is held only once, it is accurate and up to date. This vision was widely held 5 years ago, but the world has changed dramatically.


    Even if we could marshal the resources and consensus to reach such nirvana, it would soon be out of date as the rest of the world, and our business requirements, move on. Furthermore, such a monolithic system holds back the organisation. Updates to the database structure need to be overseen by a senior committee. Downstream problems caused by structure changes or backfilling data becomes the stuff of career nightmares. The effort and risk to update the database structure makes the organisation unable to respond quickly, if at all. IT would be paralysed, and "shadow" systems (see below) would proliferate. Of course, you could replace a monolithic system with another monolithic system, either bought in or built in-house. If you have all the functionality you require, perfect data, and an unchanging business, then that could be a reasonable choice.


    In a dynamic world, we may prefer to be able to treat our systems as a set of components where we can swap new bits in and out. This enables us to write or buy new components to respond to new business, to swap out other components which can no longer be supported, or where regulations have changed, or a supplier has changed their pricing model. It enables us to treat those components in different ways, for example being very strict about controls in a central accounting function, whilst prototyping with newer technologies for newer businesses. Inevitably, as many of these components will have their own "database", this moves us away from the idea of a piece of data being held only once. This creates problems with ownership, consistency and integrity, which constitute many of the real issues for most businesses' data management. Our data is held in several silos, and we need mechanisms to link them together both for daily operations as well as for reporting, ideally with some means of allocating ownership and the source of truth for any particular datum (piece of data).


    The reality for most companies is intractable, in that they have a patchwork of systems, most of which will have their own distinct set of data. There may be large bespoke internal systems, others created by a software firm, either of which may exist within an environment created by an even larger software firm on which you then have a dependency. Today there will be probably be some software as a service in use, and often a website or two (or 20!) created by a marketing, or advertising agency. Any larger company may have duplicate systems from a merger or takeover, and there are probably legacy systems that are no longer in active use, but for which nobody could find the enthusiasm to migrate and tidy up all the data. There will be interfaces which work well, and some which do not. Some interfaces will be overnight batch processes, so data changes in one system may take a day or two to cascade down to other systems.


    Shadow Systems


    To make the management of this hodgepodge worse, there are likely to be a set of "shadow" systems. Shadow systems are those that have been created beyond the control, or knowledge of the IT department. Some of the problems with this, mostly quality related, are clearly enumerated in a short Wikipedia entry, to which I have added some points:


  • Spreadsheets are the favoured tool for shadow system creation, but we should not forget that users may use other packages such as word processors, or even manual files to create their own system.

  • Shadow systems are a sign of a gap. Mostly this is a lack of functionality in the existing systems, which the users bridge on their own initiative, or even that they are using shadow systems to bring together data from different silos. However, it can also be a sign of a lack of understanding of the functionality available in the existing systems. Even if staff are trained on the introduction of a system, there will be staff turnover, writing and reading manuals have gone out of fashion, and knowledge and understanding is lost. Thus, end users re-create functionality that already exists, and we end up with inefficiencies and multiple partial sets of data.

  • Such systems can serve as a useful prototype. They indicate the functionality that the users actually want, and potentially the extra data required to support that.

  • There is a case to be made that some departmental systems exist in the penumbra, if not quite the full shadow. Software as a service (SaaS) has made it easier for departments to implement and control their own IT services, such as a sales department running Salesforce. They will have their own budget. They will use their own resources, possibly with the help of an external agent, to get things up and running. They may tweak the software to better suit their requirements, or even use it like a development environment and create their own shadow systems within the software as a service. This can all be invisible to the IT department, who are only perceived as control freaks who would slow things down and get in the way (along the lines of Mordac the preventer of Information Systems in the Dilbert cartoons).


    Software as a Service


    From a business viewpoint, SaaS is a good thing, particularly for small and medium size companies. It provides access to rich functionality for a fraction of the cost of bespoke development. You can be up and running quickly, with little business and technical risk. You don't need your own servers and data centre and all the infrastructure and staff costs that go with it. The system will be delivered through your standards compliant web browser, which means you can access it from multiple devices and operating systems: there should be minimal installation issues and IT support requirements. The physical boundaries of your office are no longer a constraint, and it is easy to extend your virtual company by giving access to advisers, customers and suppliers. We can conceptualise the SaaS silos as components of our overall system architecture. The more they are stand-alone silos, the easier they are to swap out for another system should the business requirements or the environment change. It also helps to have chunks of functionality of a size and scope that most people can cope with, and that are limited in the number of departments they service. Clear ownership and responsibility help get things done.


    However, there are plenty of traps here. In an ideal world we record data only once. We can then look after the integrity and consistency of our data. In our siloed world this is not the case. Each silo may have its own version of a customer table, with a slightly different but overlapping set of customer data, for example. Static datasets may be different between our silos - we might have different country codes, different regions, different date formats. One system might be well validated to 20 job functions, whilst another has 43,000. The underlying conceptual models of systems may be different, so it is difficult to map entities between systems, never mind fields. One system may have been conscientiously tended, whilst another has data thrown in to meet an immediate client turn-around or just has an email address and nothing else. Relating the data in one system to another may not be possible in a reliable manor, and reconciliation is then impossible.


    The same departmental ownership that helps getting things done can be obstructive and damaging for the business when trying to put together an approach that supports the whole company. There are also dangers that departments become, effectively, captive customers. This can become exacerbated where staff have become "certified" professionals, who then want to continue to work in that particular SaaS environment. A narrow and incremental approach means a department, or indeed the whole business, can invest too heavily in customising SaaS to their needs when an internal application or fuller use of an existing system would be more appropriate. Our idea of being able to swap components in and out of our suite of applications will meet heavy resistance from those who currently have control, an emotional ownership of their system, those who dislike change, and those with skin in the game.


     


    IT as a cost


    There is a cost element to IT, the same as for power, water, general HR costs and other inputs. The danger in being too enthusiastic about driving down those costs is that companies lose sight of the fact that some IT costs are an investment: capital expenditure rather than expense. That investment may be more nebulous than a piece of large capital machinery standing on the factory floor, but like the physical assets it will require some maintenance effort if it is to continue to add value. When paring down those IT costs, we must be careful not to throw out the baby with the bathwater. Agile is a good approach for projects, but there is a danger that maintenance is missed, and that small updates are never quite important enough to get scheduled. This can then lead to disgruntled end-users and yet more shadow systems.


    Project plans are made with the idea that IT staff are inter-changeable, but as Frederick P. Brooks Jr. noted in "The Mythical Man Month", some programmers are 10 times more productive than others, with no correlation between experience and performance. A good analyst/programmer will understand the business as well as the end-users and can then be proactive in suggesting ways forward and guiding through the fog of miscommunication that bedevils projects. In the past they would have continued a relationship with the users, probably doing the software maintenance. I worry that in an agile environment it is more difficult to continue this working relationship between users and analyst/programmers, and other IT staff deployed for a sprint will be less effective until they too are up to speed. To outsource such people might gain a manager a short-term bonus, but it is incredibly naïve.


    I fought some early skirmishes in the battle against the all-powerful and controlling corporate data centre when PCs were still establishing themselves as valid grown-up business tools. Committees decided what got done, which was executed on the in-house machines using the approved tools provided by the manufacturer. User interfaces and performance were of little importance. Things have moved on. Technology is now pervasive, and IT are much more responsive to the needs of the business. However, there can still be a sense that central IT will try to impose controls on departmental computing that will slow them down and restrict what they can do. They are Mordac made flesh, out to scupper the plans of those who are at the sharp end of getting the business done. This is grossly unfair. IT must operate with a high degree of quality control, change control and general rigour. They may be responsible for information security, data protection, data quality etc. for the whole organisation, including all those shadow and departmental systems over which they have little control, some of which they will not even know exist. Responsibility without control is not a recipe for success.



    Managing the Morass


    Tolerate Ambiguity

    This may seem a facile suggestion. It is anathema to most technical and data people, who work to bring order. We analyse, we classify, we fit things into known patterns so that we can re-use concepts and program code. We work towards a goal where everything is in order, effectiveness and efficiency are maximised, change is controlled, specification and testing are rigorous. But one of my starting points was that, even if we could fleetingly build an all-encompassing optimal solution, the world would move on and our super solution would render us unable to respond fast enough. We need to accept a set of component systems that will inevitably put our data into silos. We need to accept that some departments will want to push on with their own systems, and that they will probably get there faster without us doing it for them. We need to embrace shadow systems as a symptom of some sort of gap and thus an opportunity to do it better in the next iteration.


    Avoid analysis paralysis. We will obviously want that systems map for the office wall and the PowerPoint slides, but we probably don't need the data schema for all those third party and SaaS systems, especially as they have functionality we don't use. We can treat them more like a black box, or even a cloud, and focus on the outputs, and hence the inputs. It is more important we talk to the users and understand where the systems are adding value, or not. We need a holistic approach.


    It is also important that we accept reality. It is unlikely our data is complete, accurate, and up to date. Whilst humans are involved in adding and maintaining it, this will continue to be the case. I am not suggesting we give up, but that we accept some imperfection and don't insist on data that is not required, or the end-users will find ways around our controls. Be prepared to implement changes that fix things in one area but that are "no worse" in another. If we really must go through a data clean-up, then the scope must be clear and minimal and the processes and procedures to keep that data clean in place and running, otherwise momentum will just fizzle out.


    Be Proactive

    Another cliché. My point is that rather than burying yourself in busy analysis and planning, you should put yourself about. There is a constant need to sell the use of data as a shared company resource, and a vision of how you can do it better. Most users grasp this anyway, but there will be others for whom the concepts and operating models that are obvious to you will not be so for them. A particular problem will be with those managers who want to control "their" data, and who can be very resistant to change. Some nagging may be required.


    Moreover, you need to sit down with the users and see and understand what they actually do and what data they operate on. What is the role of that Job sheet? Why is that lever arch file on the desk? Why does the manager spend 30% of their time keying reporting data into Excel when the company has spent so much money on data lakes/warehouses or other "solutions"? An audit may be too formal a way to put this and might scare back into the shadows the owners of what could turn out to be an important part of your company data. But you really can't do your job if you don't understand what is out there in the shadows.


    You may need to sell the idea of IT as an investment, rather than a cost, and that IT are there to help rather than just prevent others from doing their job. As part of this IT should be looking to inveigle their way into the departmental and shadow systems in the hope of modifying the outcomes without being heavy-handed. Some oversight and conformity to standards can be gained by helping with the more technical aspects of departmental SaaS: customisation, data load, and the building of interfaces. This also gives you the chance to sell other options should the department want to customise their instance of the SaaS system into some byzantine castle in the clouds.


    I think you should also embrace the shadow systems and provide professional help to support and enhance them. This is not just because this is the space in which I mostly work. There is risk and inefficiency here, most of which is probably hidden in plain sight. A fair amount of these systems are probably superfluous and cover functionality that may already exist, or that could be yielded very quickly from the "official" systems by an IT professional. Others could be combined, or share their data, or be written to get their data via an interface or download. Tact may be required to change habits, as well as the acceptance of some sub-optimal interim solutions, and care taken to avoid scope creep. Critically, these systems are brought into the light; efficiency and data sharing can be improved, risk is better controlled, and IT are seen as facilitators. One hopes that in due course the functionality of these shadow systems will be incorporated in other applications, but by then there will doubtless be other needs being met in the shadows.


     


    Interface and Reconcile

    We will have to interface our various systems to get the data from one silo to another. Most modern systems will have facilities to enable this, and if not we can probably find alternative ways to do it, such as going direct to the underlying database, or using a software robot. We can, therefore, do something about the inherent inefficiencies of our siloed data architecture. However, we will still have problems with data being held more than once, and in particular how we keep this all synchronised less we end up with multiple versions of the truth that are different. If we have written all the components, then we can ensure that data is not held more than once. More likely, we will have multiple systems with some sort of customer table. For example, we can download customer data from system S (sales) to system A (accounting), where S is a SaaS system running in the cloud, A is packaged software running locally. We then have issues if the customer data in system A is updated inside system A, as it is then different from the system S data. We may wish to insist that updates are done in system S, but system A will probably have fields pertinent to its function that system S does not, for example a customer credit limit. We could add a mirrored field to S, but can we trust the users of S to enter such data and get it right? Can we wait for the interface to copy the amended data from S to A? But if we let users change data in A, they will change some data that has come from S.


    One way or another, our component systems will soon be out of kilter. This is the fundamental flaw in our data architecture consisting of many component systems. It is the chronic complaint we must accept in exchange for the flexibility and dynamism (and cost) advantages. However, we can try to ameliorate the effects, and this boils down to responsibility and reconciliation. Ownership and responsibility for data must be clear and accepted. This may have to extend down to field level in some common tables replicated across a number of systems. Like many procedural solutions, this will not stop the problem, but it will help highlight who must fix things, and probably who has transgressed, when the reconciliation process tells you the systems are not in agreement. Overall data quality will improve as users learn what data they can operate on, in which system.


    Putting the reconciliation process in place is an IT task, on the assumption that this will be mostly automated. Ideally the reconciliation reporting will only report anomalies, and these are passed to the relevant owners to rectify quickly. Even before an interface is in place, there should be a reconciliation process to oversee it, and identifiers in source and target systems so we can trace data. That means the reconciliation can be used as part of the interface testing, and then kept in place and monitored. This is especially important as IT, and the interface processes they have written, have the ability to create havoc with the data far beyond the capabilities of ordinary users, sometimes for months before this is noticed.


    The critical point is that we cannot just install an interface and walk away. We need to design and put in place the responsibilities and processes to find and correct discrepancies, and resources are required to keep this a continual effort.


     


    Focus on added value

    Well obvs. Unfortunately, there are a whole set of incentives and behaviours which do not encourage adding value for the business and that pervade companies from bottom to top. There is scope here for another blog, or perhaps a book. For now, I just want to focus on the need to be seen to do stuff. Some careers are made by not doing things, and we have all probably seen examples of survivors rising through the corporate ranks whilst avoiding anything that involves risk and responsibility until they reach their "level of incompetence" (as postulated in The Peter Principle). Most go-getters, however, are keen to be seen to have added value by implementing new things. Read their CVs and it will be full of how they put in the new system that contributed some magnificent gain to the company's bottom line, or throughput, or some other unambiguous measure. You will struggle to find a section that says "saved the company loads of money, risk, and management time by not implementing new system".


    There is a pervasive belief that problems with data can be solved by new systems. Systems do contribute to poor data quality, but in this day and age most issues are related to human behaviour, to understanding, to perverse incentives, to ambiguous ownership and responsibility, to belief that the little bit of extra effort required is not worthwhile or appreciated. Taking the questionable data from your old system and pouring it into a new one will quickly lead to a loss in confidence in the new system. There are incentives to throwing old systems away, not least the wish of IT to keep abreast of the latest technologies for their career. Systems will have a lifetime, but there are ways to keep otherwise functional systems running.


    The temptation is to come up with a grand plan, particularly if you use some consultants. This is, after all, why they are employed. This will be done after some in-depth analysis and will recommend the replacement of various systems either directly, or by incorporation into a smaller set of components to reduce the overall complexity. This may well be a valid approach. My suggestion is not to rush into this. If you spend on a new "solution" then you may find that the resources I have suggested for supporting end-user shadow systems and departmental SaaS, and for data reconciliation are no longer budgeted, especially if the new system has inherited, or generated, new problems with the data.


    Consider also that it is not just important that your data is good: your users must believe that the data is good. If the users do not have confidence in it then inefficiencies and shadow systems will creep in as they double-check or re-generate the data. Furthermore, they are unlikely to take the appropriate care when updating the systems if they think someone upstream has not done their bit. The lack of confidence in the data becomes self-fulfilling. The promise of a fix via a new system six months or a year down the road sends the wrong messages about responsibility and the importance of data to the organisation. If you can fix the issues that lead to poor data as a high priority then the dynamic nature of data means it will be in much better shape by the time you do replace systems, and the users are more likely to see how important this is.


    My appeal then, is to avoid the mega plan: to be more heuristic. Only to get down to the bits and bytes where necessary. To tolerate ambiguity and sub-optimal solutions. To see the situation as in constant flux, to see that value is added by functionality for the users and the pertinent and accurate data required for that, not by new replacement systems per se, or data just for the sake of completeness. Focus on fixing the known niggles, plugging the gaps, interfacing and reconciling data, supporting the business, being an evangelist for sharing quality data, and looking at where the company is going, rather than implementing a static grand design.


     


    Postamble


    I hope you have found this helpful and thought provoking. I welcome comments to affirm or disagree with the points I have made. Please contact me if you would like help with managing your own morass, fitting component systems together, or making sense of a plethora of shadow systems.


    John.Davis@cranfieldsoftware.co.uk

    15 Nov 18.

     

    Further Reading

    "Dilbert Gives You the Business" by Scott Adams





    Why Hubspot CRM? - a consultant's perspective


    Over a year ago I elicited opinions on the best choice of a CRM for a small business from the LinkedIn community. This crowdsourcing helped inform a project I was leading for a small, but fast growth business that is a heavy user of Microsoft Office, in particular Outlook and Excel. The client had used Salesforce in the past, but not for sales, and were not keen on the complexity and cost. Integration with Outlook helped narrow the field of available CRMs to obviously include MS Dynamics and various Outlook add-ins. Following the review I recommended we went with Hubspot because:


  • The CRM and hubspot sales are free. We could have a prolonged test with little cost.
  • The integration to Outlook email works well. Companies and contacts can be added to hubspot automatically from the sidekick Outlook add-in. Email opening and conversation trails can then be tracked in hubspot.
  • It is clean and fresh with a clear conceptual model.
  • It is all in the cloud. Collaboration and multiple devices are all possible.
  • A useful API is available, even for the free version.

  • There is little point me listing all the features, most of which will be common to most modern CRMs, so I will highlight a few points, many of which have both advantages and disadvantages:


    Both the "Track Email" and "Log to CRM" options from the Outlook add-in work well, and the users like this.


    Hubspot will go off to the internet to try to fill in some missing details (company details from domain names, LinkedIn lookups), and this is helpful. But sales people being sales people, the data will be minimal unless you create some regime (and incentives) for them to come back and tidy it up. Don't expect to get much insight or target your marketing from the contact data unless you do this, but this is a universal issue.


    Adding new properties (fields) is easy, with a good selection of field types. I have a slight concern that creating new properties is too easy, and users may be tempted to do it without checking whether there is an existing field or functionality to meet their need, or they create a structure which meets an immediate need but is not normalised and quickly becomes unmanageable. This will need controlling, but again this is a universal issue.


    The "out of the box" reporting is minimal. The "Dashboard" looks nice at first glance, but most of it is "crippleware" unless you subscribe to the paid version. However, the filters on the browse pages for the major tables (Company, Contacts, Deals), all work well, including filtering different stages of your sales pipelines. You can save named filters, so there isn't so much need for traditional reporting. With the necessary rights, you can download data into Excel.


    One can record only a single level parent-child relationship between companies, i.e. a company can be a parent, or a child, but not both. There is no data sync between parent and child companies and no data will roll up to the parent company. This is sub-optimal, but is also something that it is very difficult to get right in any system without hideous complexity that baffles the ordinary user, and in this respect sums up where hubspot CRM sits: It does not have the functionality of Salesforce, but neither does it have the complexity and clutter. Most users will be up and running in minutes.


    Initial data load is very easy, and companies and contacts can be updated via an upload if you want to go for a soft start. However, deals can be added but not updated by this route. If you want to do a bulk update of deals then you will need to use the API.


    Extensibility


    The ease of adding fields, and the possibilities opened up by the API, make it possible to extend the use of Hubspot's CRM. Whilst there is always likely to be some tailoring to fit to a particular business, this opens up the possibility of building out the workflows beyond just sales. If we can extend the deal record with a few extra fields such that production can then use it to record and control their business, then we leverage the existing data, and sales even get a view of the progress of jobs. If we can create the invoices in our accounting system from completed deals, then it can reduce re-keying within the company and incentivise sales to get their data right up-stream. There is a judgement call here in terms of how much can be moved into hubspot, and once more it does not have the facilities of Salesforce, but there will be enough for many small businesses to integrate their business beyond just sales and marketing.


    We have used the API to report directly into Excel. Whilst this has saved the cost of the reporting module, we did it because there was some complexity in the reporting: customers mostly pre-pay annually for this particular business, and then draw-down individual projects against the pre-payments. We track these as two different flavour pipelines and then match it all up in Excel.


    Having set off down this road, we now do lots more reporting via the API to Excel, and by providing hyperlinks back to hubspot, most of the sales users actually use the Excel reporting as an index to their work, and the sales management use it at both a summary and detail level.


    Conclusion and Recommendation

    There have been implementation issues, but these are mostly associated with organisational behaviour and would apply to any CRM implementation. I will cover them in a follow-up blog.


    Some technical confidence and discipline will be required to set up default field choices, possibly add new fields, load data, and create filters etc. This would not be beyond competent users with a feel for data, of the type you might find in a modern marketing department. Finding someone who has done it before would accelerate the process as long as they can see your particular needs.


    Utilising the API needs professional IT skills. If you can combine this with someone with business understanding and vision, this really opens up the possibility of integrating the CRM into your business, both to leverage the data, and also to build your workflows into and on top of the CRM. This is more of an investment, but provides a path for a small business to grow without losing control.


    Hubspot CRM and sales are certainly working well for our client small business, and I am happy to recommend this path to other small and medium sized enterprises.


    I hope this blog is helpful and I welcome comments; but of course if you are looking for more specific help with #Hubspot do get in touch - no sales pitch just genuine independent advice!


    John Davis
    20 Jul 18.


    Data alright?


    We may be wrong to focus on our data being right, when we should be more concerned that it is the right data. There is a congruence here with efficiency and effectiveness - doing the thing right and doing the right thing. We can put lots of effort into honing our processes and interactions so that they are efficient, but this can be to no avail if we are not doing the right thing. All very obvious you might think, but the reality is many people have conceptual problems with that abstract stuff we call data and will ask people like me to check if their data is right and miss out on the more important question of whether they hold the right data. I recently examined a database for an application that had been in use for some years. The quality of the data on most measures was excellent. The curious incident was what was not there, amongst which were email addresses and mobile phone numbers.


    The data protection principles tell us that data should be right in both ways. The right data comes first in principle 3: "Personal data shall be adequate, relevant and not excessive...", or as my wife might specify it: "enough, but not too much". Principle 4 tells us the data should be right: "Personal data shall be accurate and, where necessary, kept up to date". In some ways we are struggling with terminology, as we don't have words that necessarily sum up our different dimensions of rightness with a clarity similar to efficient and effective, but I'm going to use pertinence to talk about the right data, and accuracy to talk about the data being right.


    We can measure, quantify and report on both efficiency and data completeness and accuracy, and plot ways to fill any gaps in the existing record. The questions of effectiveness and whether we are recording the pertinent data are much more open and difficult to quantify. Whilst there may be some things that are definitely wrong, it is difficult to be certain that we ever have it as good as it could be, but we can be pretty certain that the world will move on and invalidate both our accuracy and pertinence. Focusing on pertinence more that accuracy, we move from the certainty of analysing the data quality to business analysis where we will need to take account of the dynamism of data and the business, and differing frames of reference for different users.


    Within our companies, what is considered pertinent is going to vary by department. A contact record with just an email address may be considered a valid lead, and if you measure your sales or marketing effort on new leads then you will probably have lots of these. In the business world we may be able to extract some information from the domain, but the marketer tasked with deriving meaningful insight from this data has a lot to do, and would prefer all fields to be populated with accurate data. An operations person might be uninterested in the age of your contact, but would like the address and postal code to be correct. They may also be interested in any references stored to other systems, so they can link the data to the corresponding data elsewhere. An accounts person might be interested in something as basic as whether this is a new or existing customer, and the high level of duplicates in many systems shows this is not as simple a question as it sounds. Of course, you can try to enforce some mandatory fields, but the side effect of this can just be bad data as users enter something to get past the mandatory field that they see as stopping them doing their job. Mostly, bad data is worse than no data, as we may otherwise make assumptions based on that bad data.


    Even in well controlled systems we may have problems where we try to categorise data. Too few categories and we lose precision in our insight. Too many and we similarly lose precision as the users will pick either the first one or an overall bucket. Enabling users to add their own categories may lead to chaos, as once you are beyond 30 or 40 categories users will add their own rather than try to find the closest match. I've seen many thousands of job functions in a system. Our categorisation, like our overall data, needs to be pertinent.


    The world is dynamic, and our data will decay over time, both in accuracy and pertinence. General accuracy will probably decay faster than pertinence, but the way we reach customers and they find us keeps developing. You will need to review pertinence regularly. As with general accuracy, do not be afraid to throw away the old stuff. Not only will this help you see the wood for the trees, it is a data protection requirement. You may need to learn to tolerate some ambiguity, particularly as regards completeness. This can be difficult for those of us who have grown up with fully populated data, as well as making our query writing and reporting rather more complex, but the reality of the modern world is much more to start with minimal data and then update and enhance it when and where you can. This can be quite difficult, but at least if will be effective if you have focused on the right data before striving to get the data right.


    John Davis
    10 Jan 18.



    That's another fine (data) mess you got me into.


    As part of my recent audit work I have mused with colleagues as to how an organisation populated with intelligent, committed, conscientious and well-intentioned individuals can get itself into such a fine mess with its data and/or systems. Generally, things have improved as new methodologies have emerged and been honed. Nevertheless, there is still plenty of scope to go wrong, and I am going to contend that mostly this boils down to old-fashioned management issues rather than anything inherently technical. My particular worry is that managers are not taking responsibility.


    I have been struck how IT departments are now viewed as a cost and have been pared down. Where development work is still done in house it may be done as sprints under an agile methodology. This is fine, and a good way to get the development done with involvement of all parties based on quick prototyping. The danger is that issues that come to light after a sprint may be left to fester. Particularly if they may seem small issues. Big bugs are easy to spot and should be caught in testing, and if not will still get fixed quickly when found. It is the little bugs or missing functionality that are more insidious. They may just work gradually corrupting your data. That incomplete lookup will cause wrong categorisation. The missing field will mean that another user field is doubled up with a comma; or was it a semicolon. Such problems may cause the users to need to run a little manual fix, or more likely a log of some sort on a shared spreadsheet. The resolution could be a small fix but doesn't fit into a sprint anywhere. IT have moved on. The close link between system and developer, possibly intermediated by a business analyst, has gone. Nobody is responsible.


    The problems run deeper into our organisation populated with intelligent, committed, conscientious, well-intentioned and thoroughly nice individuals. Being well educated and experienced they are all managers. It will say so in their job title. They may manage external relationships or if they have any direct reports they are managing other managers. It is like an army with officers but no sergeants. Supervisors seemed to have been managed away. Nobody is supervising. With them has gone the quality control of data at a record level - the data that the rest of the organisation runs on.


    Eventually the organisation will realise its systems/data are not fit for purpose and will investigate. Surprisingly often the perceived fix will be to build or buy a new system. There is an optimistic belief that somehow new systems will cure organisational/management problems. Even where the old system was a bespoke internal development. I could go into the reasons why this may seem a good idea to management, but that may send the cynicism way off the scale for one short blog.


    But don't worry. These are not issues you need to fret about; it is not your responsibility.


    John Davis
    7 Aug 17.



    Latest Analysis Secrets.

    Courtesy of Rudyard Kipling


    I keep six honest serving men
    They taught me all I know
    Their names are WHAT and WHY and WHEN
    and HOW and WHERE and WHO.


    John Davis
    27 Nov 14.



    The importance of the old school tie in the internet age.

    There was an excellent "Schumpter" article in the 18th October edition of the Economist on how many of the predictions of the impact of the internet have been wrong. Not just slightly wrong, but completely out of phase with reality. Schumpter looked at these three predictions:


    It would be possible to write books on the failure of each of these three predictions, but it is the lack of disintermediation that I find particularly interesting. Having talked to various recruitment agencies recently in my search for contract IT work I can say that they seem to perform two functions. One is a simple pattern-matching exercise to match candidate expertise to job specification. They have lots of CVs to process, so this seems to be a strict exercise with no room for thought outside the box. It could be done by a machine. The second stage is to call and validate what the filtered set of candidates have said they can do. I have done some technical interviews. Typically they don't last long, with the candidate soon apologising for putting some acronym on their CV when actually they have only had the most fleeting exposure to it.


    It seems then that intermediaries continue to thrive, and new forms of intermediation will grow, for two reasons. Firstly, the virtual world is just too big. Many people would not know where to start searching, and even if they did they may not have the time and knowledge to sift through all the data that would be returned. Secondly, that world is full of companies and people who might not be all that they first seem. Who can you trust? Most people would know that online reviews are open to abuse, and would treat them as only a rough guide. They may not have realised until recently that those price comparison sites will act as a broker, taking commission and quite possibly not showing them the best deal if more commission is available on other deals. Who guards the guards?


    So how do people cope? They cope by relying on brands. They cope by relying on personal networks: the old school tie, colleagues and local contacts (we're back to the importance of proximity again). There has been disruption, but not the huge disruption that was forecast. Instead, the internet has increased barriers to entry and reinforced old behaviours. I forecasted the importance of brands 15 years ago, but didn't realise the importance of contacts, otherwise I might have gone to more old school and college dinners and joined linkedIn earlier.


    John Davis
    13 Nov 14.



    Data Issues going around in Circles



    Whizzing round the Olympic velodrome last weekend with my family was great fun, if not slightly terrifying as you go up the banking. Not surprisingly data was the last thing on my mind! But at the end we received certificates which were all misnamed. This is pertinent for a couple of reasons. Firstly, it shows there is a manual interface in place. Somebody had copied our names from a screen (our online booking), or printout onto a piece of paper and passed that paper to someone else to create the certificates on another system. That interface was obviously seriously error prone. Secondly my name is John Davis. It is quite common (I blame my parents), and most clerical workers would know that it could be John or Jon, and Davis might be Davies. My son, Guy Davis, became Gille Davies.




    So the great experience was tainted by a certificate that was wrong for every single member of the family. No one had thought through the consequences of the data quality issues and had taken a bit more care. We won't be framing it and putting it on our wall, or tweeting a picture of the certificate or sharing on Facebook - losing the Velodrome valuable free marketing. These issues with missing interfaces and poor data quality are the same ones that I came across when I started working in computing and business analysis in the 1980s. Things may have improved for a while, but now that companies have some applications in the cloud we have returned to a situation of many silos, and often some imperfect interfaces between them.


    A focus on data quality is an issue that will run on. This is partly an issue with supervision and the reinforcement of the importance of getting data correct before you can rely on it to drive your business (It needs that great MBA stalwart, "Senior Management Commitment"). This will get no easier as the workforce is infiltrated by a younger generation who are accustomed to txting and spreading their focus across a number of apps on a number of devices, with a commensurate loss of attention to detail (am I sounding old now?). Nevertheless we can do better, taking out scribbled manual interfaces by calling APIs, using our systems to do more validation and checking at the point of original data capture, and reconciling data between silos and against other sources. Not only is this more efficient, companies get happy customers as it shows they care about the little things - our names, and who we are!


    If you need help with your processes, data, or interfaces, then give me a call. I'm the red and black blur.


    John Davis
    4 Nov 14.