Blog

Managing our list of lists on GitHub

Whilst we prototype Identify-Org.net we’ve been using AirTable to manage our research work.

However, long-term, we want to provide a way for many different contributors to add to our collection of known organisation identifier lists. So – we’ve started the move towards managing the authoritative copy of our list-of-lists on GitHub.

In the codes directory of our main repository you will find a directory structure with JSON files, one for each of the organisation identifier lists we are aware of. These files each follow a common structure, and contain the key fields we need provided in order to present and rank each list.

screen-shot-2016-11-28-at-08-36-46

These files are compiled together to generate the authoritative identify-org.net codelist, and to drive the demonstration org.prefix.codes

This means that if you wanted to, for example, suggest a range of new identifier lists for a particular country, you could fork the project repository, make your updates, and then submit a pull request. We’ll review requests, and merge updated data into the authoritative copy of the list.

Next steps for this work are to:

  • Create a schema for our JSON structure
  • Add in some automated tests for pull requests;
  • Look at better synchronisation with AirTable;
  • Provide a user-friendly single-record editing interface for new contributions;

If you would like to test out this new set-up by submitting some new identifier lists – just get in touch.

 

Our approach to prefixes/list codes

Example prefixesFor the Identify-Org codelist we are following the precedent set in the IATI Organisation Identifier Agency codelist of assigning a code to each list of organisation identifiers based on: (a) the jurisdiction covered by the list; and (b) an acronym or short-name for the list in question.

This provides more memorable prefixes, though at the cost of:

  • Linking prefixes to a particular language version of the list name or acroynm;
  • Prefixes becoming less intuitive if an organisation identifier list changes it’s name in future (as we would not want to change the prefix in this case, because this would break existing data interoperability);

The alternative to following the IATI approach would be to assign each organisation identifier list with it’s own dumb identifier, possibly following the approach of numerical identifiers used in the Global Legal Entity Identifier list of company registries. However, to do this would make Identify-Org codes incompatible with existing IATI and Open Contracting Data.

Our methodology in full for assigning prefixes is given below:

Each organisation list prefix is made up of two parts: a jurisdiction code, and a list code.

1) Jurisdiction code

For any list which contains entries only from a given country, the ISO 2-digit country code should be used.

For lists that contain entries from multiple countries, one of the following codes should be used.

  • XM – Multilateral/international agencies. – The list contains multilateral or international agencies. These organisations are generally not registered at the national level.
  • XI – International. – The list contains organisations based anywhere in the world. The organisations may or may not be registered at the national level. XR – Regional. – The list contains organisations based within a particular region. For example, organisations within the European Union, or within Africa only.
  • ZZ – Publisher created. – This list was created by a publisher, and is maintained by that publisher alone.

(We rely here on the fact that in ISO 3166-1 the following alpha-2 codes can be user-assigned: AA, QM to QZ, XA to XZ, and ZZ. We avoid any widely used X codes.)

2) List code

The list code is manually assigned. In general, list codes have been designed so that:

  1. They are between 2 and 7 characters long;
  2. They use a recognisable acronym or contraction of the name of the organisation list;
  3. Acronyms should be based on the local language version of the name;
  4. Codes should be memorable allowing a user can become familiar with codes. However, there is no intention that the list code portion of the prefix will carry any semantics (e.g. type of organisations listed etc.).

Points 1 – 4 are guidance only. Breaking any of these principles shall not be grounds for revision of an existing code.

Procedure

When a new organisation list is identified, the creator/proposer may suggest a list code.

There will be a short window for proposed prefixes to be reviewed before they are confirmed.

To confirm a code, the researcher should check:

  • The proposed list code is based on a clear understanding of the name of the underlying organisation list or list provider.
  • Any requirements for multi-lingual codes. For example, in Canada, acronyms should always be given in both their English and French forms. This is achieved using an underscore separator in the list code. E.g. CA-CRA_ACR for the Canadian Revenue Agency / Agence du revenu du Canada
  • The code does not duplicate an existing code, or AKA value.

If a list code is later replaced, the full deprecated prefix should be recorded in the AKA field. This will allow systems to warn about deprecated prefixes and to notify users of their replacements.

Researching organisation identifier lists: methodology and early findings

Over the last few weeks we’ve been testing out our methodology for researching organisation identifier lists.

Identify-org.net will prospectively include any list of organisations that assigns them a consistent identifying number of string, or provides enough information to disambiguate one organisation from other. However, we need to be able to identify which lists should be preferred over others as offering the best chance of delivering interoperable identifiers.

We also need to make sure that users searching for a potential id for an organisation have enough information to choose the best source, and then to locate and make use of the right identifier.

In our draft researchers handbook we set out a number of key definitions, and the steps to go through in researching any identifier list.

For each identifier list we aim to:

  • Assign a meaningful prefix;
  • Identify whether the list is a primary register, or a secondary list of organisations;
  • Describe how identifiers from thie list are assigned;
  • Describe the jurisdictions, legal types and sectors that the identifiers in the list cover;
  • Find example identifiers;
  • Document any bulk access available to the list, and the license of any data.

We also aim to document any mappings that might exist between lists: for example, when a charity register also records the company numbers of companies with charitable status. And we captured key ‘Need to Know’ facts that a user might want to be aware of when researching identifiers in a particular country or sector.

Early findings

So far, we’ve reviewed around 30 organisation identifier list entries imported from the IATI Organisation Registration Agency codelist, and worked through the research methodology to update the meta-data about them.

Some key observations so far:

How identifiers are written down matters

For example, the Australian Business Number is a nine-digit identifier, but is generally written down as an 11-digit number, with the first two digits acting as check-sum for the identifier itself. When presented on screen, systems often show the 9-digit version as three triplets (e.g. ‘123 456 789’), but download a dataset of the numbers and you will find them as a single string (123456789).

When constructing a unique organisation identifier using the prefix for the Australian Business Number list (AU-ABN), how should this be written?

We need to develop general principles (e.g. remove spaces) and develop specific guidance for each identifier list there there is a risk of ambiguity.

NGO Registration is a compex businesses

A number of the entries on the IATI Registration Agency Codelist are for Government Ministries responsible for overseeing NGOs operating in their country. In some cases, we managed to locate a register that the agency holds, although in other cases, we couldn’t find mention of a register at all*.

These registers often cover ‘NGOs operating in the country’ and so they might act as the primary identifier for local NGOs, but only as a secondary identifier for international NGOs operating in the country.

We need to review our ‘primary’ and ‘secondary’ identifier distinction, to identify whether some further graduation is needed (e.g. lists that are ‘primary for some organisations’).

*In these cases we’ve marked the identifier list as ‘deprecated’ ready to potentially be removed.

Single government identifiers may provide a pragmatic option

In many countries, business registration takes place at a local level, through Chambers of Commerce or other entities. The same entity might have to be registered in a number of different states. Other entities, like Charities, may not have to register at all.

However, often there exists a register of all the organisations interacting with government in some form. For example, the Australian Business Number mentioned above is described like this:

“The Australian Business Number (ABN) enables businesses in Australia to deal with a range of government departments and agencies using a single identification number.”

When organisations transact with government, they generally end up with an ABN – and there is a national dataset of these identifiers.

In our model, this is a secondary identifier (there is no solid guarantee that it uniquely and persistently picks out a single legal entity), but pragmatically, it may be much easier to find and use than a local company registration. And having a single dataset that covers companies, charities and other entities is much easier to work with than lots of disparate datasets in a country.

We need to consider how this will affect the way we prioritise identifier lists – and to make sure we clearly document the nature of each identifier list.

Governments are moving towards unified identifier databases

We’ve found a number of cases where governments have either recently built, or are working on, national datasets that aggregate together state-level identifiers, such as from individual Chambers of Commerce.

In the best cases, these registers might also include identifiers for government agencies as well.

There may be opportunities to advocate for these centralised directories to follow best practices for open data, and to promote common standards for register publication

Where next?

We’ll continue on our first ‘research sprint’ for the next three weeks, aiming to confirm around 100 organisation identifier lists. As part of this we’re also seeking to work out how long researching each list takes on average, in order to think about a sustainable approach to keeping Identify-Org.net updates.

We’ll then be taking a short pause over the Christmas break, and returning to research in the new year with a refined methodology based on all we have learnt so far.

Press release: New joint initiative launched to build the next key piece of open data infrastructure

Following up from our sessions at the International Open Data Conference, the partners have announced the initiative with the release below. 

A group of leading open data standards bodies have announced an exciting new collaboration to tackle a shared problem: how to accurately identify organisations.

The identify-org initiative was launched on Friday 7th October at the International Open Data Conference (IODC) in Madrid. It brings together key organisations driving standards for open data across a range of sectors including contracting, extractives, international aid, agriculture and philanthropy. A challenge shared by all these initiatives is how to accurately and consistently identify an organisation. Whether it’s a charity in the UK, a company in Malaysia, or a government department in Canada, the ability to describe these different entities in a consistent way is key to opening and linking up data about their activities, ensuring it is accessible and useful.

The initiative will establish an open interface so that anyone can find known organisation registries; it will also embark on a research process to highlight others. Acting together, the International Aid Transparency Initiative (IATI), Open Contracting Partnership, 360Giving, Joined Up Data Standards (JUDS) and the Initiative for Open Ag Funding will build a key piece of open data infrastructure to enable the free exchange of information on entities, regardless of sector or jurisdiction.

To do this, the project partners have agreed to support efforts to gather and share information on different registers of organisations across the world, backed-up by a common methodology to describe these in open data. In turn, this will provide a foundation for the open data community to both share and use identifiers about organisations, using this common protocol.  

The International Aid Transparency Initiative (IATI) kick-started this work in 2012. IATI has agreed to share its initial efforts with identify-org, enabling others to build upon the ‘list of lists’ of registries of organisations.  

By working together, the project partners are pooling their knowledge of the different organisation registers that are currently available so it can be used by different standards bodies in a consistent way.

Launching Identify-org.net

Back in 2011 a workshop took place on the fringes of the Warsaw Open Data Camp to discuss the challenge of uniquely identifying organisations in open data.

If we can’t clearly identify organisations, so many potential applications of open data, like tracing funding flows, and understanding the relationship between different power holders, are made much much more difficult.

That Warsaw workshop ended by identifying the need for sustained collaboration on an authority list of organisation registers, and a focus on identifying government agencies.

In the intervening years, there have been a number of small steps forward, but a common approach to identify any type of organisation remains an a missing piece of the open data ecosystem. Until now.

We’re delighted to be kicking off a collaboration between a number of leading open data projects and standards, facilitated by Open Data Services Co-operative, to finally bring together a robust ‘list of lists’ that will form the foundation for joined up organisation identifiers across different open datasets and data standards.

Over the last month we’ve been doing the groundwork for this project, and we’ll be taking our first public steps at the Data Standards Day of the International Open Data Conference in Madrid.

If you want to get involved in shaping the project, you can get in touch to data@identify-org.net, or signup as a supporter here.