What is HLA-HD?

HLA-HD (HLA typing from High-quality Dictionary) can accurately determine HLA alleles with 6-digit precision from NGS data (fastq format). RNA-Seq data can also be applied.

Note that HLA-HD is freely available for academic purposes, non-commercial research purposes and six months limited term evaluation purpose to enter into commercial licenses. Kyoto University reserves its right to modify terms and restrictions at any time.

News

October 7, 2024 : Version 1.7.1 was released

February 9, 2023 : Version 1.7.0 was released

January 5, 2023 : Version 1.6.1 was released

December 27, 2022 : Version 1.6.0 was released

June 22, 2022 : Version 1.5.0 was released

March 10, 2021 : Version 1.4.0 was released

July 22, 2020 : Version 1.3.0 was released

June 25, 2018 : Version 1.2.0 was released

Released versions

#Version 1.7.1 October 7, 2024

When you use HLA-HD in the current IPD-IMGT/HLA database, update the HLA-HD after version 1.7.1 and re-create HLA allele dictionary.

#Version 1.7.0 February 9, 2023
Implement extraction step of read pairs overlap with HLA exons (increase mapping speed for WGS data).
Increase of the calculation speed in hla_est (same speed with version 1.5.0).
Reduction of the required memory usage (approximately 75% memory size down compared with version 1.6.1).

#Version 1.6.1 January 5, 2023
HLA-DQB2 were added in IMGT/HLA after release 3.50.0. Therefore, decoy sequences of HLA-DQB2 have been removed from update.dictionary.sh.
Deal with result of “not consistent” (output as “Not typed”).

#Version 1.6.0 December 27, 2022
Improve read score calculation using intron mapped reads (this change increases memory and computational costs).
Prepare HLA_gene.split.txt file (HLA_gene.split.3.50.0.txt) suit to new IMGT/HLA database (release 3.50.0).
Decoy sequences (HLA-DQB2 and HLA-U) are added in update.dictionary.sh.

#Version 1.5.0 June 22, 2022
Fixed a bug when calculating the score due to rounding error.

#Version 1.4.0 March 10, 2021
Accelerate typing speed and memory reduction in hla_est.

#Version 1.3.0 July 22, 2020
Accelerate typing speed of hla_est.
Correct bug in pm_extract (Avoid bug of “Couldn’t read result file” is occurred in typing result).

#Version 1.2.1 June 26, 2019
Available to input gz compressed fastq files (zcat is needed).

#Version 1.2.0.1 July 11, 2018
Correct bug that hlahd outputs incorrect positions to read.txt in some genes (DRB6,DRB8,DRB9).

#Version 1.2.0 June 25, 2018
Modify default dictionary to type HLA-DRB5 and add some genes to HLA_gene.split.txt
(HLA-DPA2, –T, –W, –Y were added to HLA_gene.split.3.32.0.txt for current release, see Running)

#Version 1.1.0.1 November 15, 2017
Modify to adapt the reference data of IPD-IMGT/HLA after the release 3.30.0.

#Version 1.1.0 October 02, 2017
The database update feature was implemented (see section Updating the HLA dictionary).

#Version 1.0.0 April 27, 2017

Download

Download request

Installation

HLA-HD requires bowtie2 to map NGS reads.
Please install bowtie2 on your computer and set path to your environment variables.
For example, if you are using bash, add to your .bashrc the following command.
export PATH=$PATH:/path_to_bowtie2

Uncompress the downloaded tar.gz file by
> tar -zxvf hlahd.version.tar.gz
Then, move to the uncompressed directory and type
> sh install.sh
For the installation, the g++ compiler by the GNU Compiler Collection must be installed on your computer.

After the installation, add the current directory to your PATH.
export PATH=$PATH:/path_to_HLA-HD_install_directory/bin

Updating the HLA dictionary (after v.1.1.0)

You can update the HLA allele dictionary to the current release of the IPD-IMGT/HLA database by the command,
> sh update.dictionary.sh
Wget is required for the database update.

You can also use any release by getting hla.dat file from the github site.
Put hla.dat file on parent dicretory of hlahd and executing the update.dictionary.sh by deleting the line of the first wget command.

The latest release can adopt the newest rare alleles.
In contrast, the old release tend to yield conservative result.

Default dictionary of the installation is created from release 3.15.0.

Running

Before running the HLA-HD, check the value of open files on your computer by typing:
> ulimit -Sa
If open files are less than 1024, please type:
> ulimit -n 1024
or change /etc/security/limits.conf according to your system environment.

If you have fastq.gz file, unzip gz file in advance.

You can run the HLA-HD by typing the following commands:
> hlahd.sh -t [thread_num] -m [minimum length of reads] -c [trimming rate] -f [path_to freq_data directory] fastq_1 fastq_2 gene_split_filt path_to_dictionary_directory IDNAME[any name] output_directory

For example:
> hlahd.sh -t 2 -m 100 -c 0.95 -f freq_data/ data/sample_1.fastq data/sample_2.fastq HLA_gene.split.txt dictionary/ sampleID estimation

If you want to type HLA-DPA2, –T, –W, –Y, replace HLA_gene.split.txt to HLA_gene.split.3.32.0.txt and update the dictionary to current release. (after v.1.2.0)

Options

-m : A read whose length is shorter than this parameter is ignored. Default size is 100.

-t : Number of cores used to execute the program.

-c : Trimming option. If a match sequence is not found in the dictionary, trim the read until some sequence is matched to or reaches this ratio. Default is 1.0.

-f : Use information of allele frequencies. The default data exist in the installed directory (/hlahd.version/freq_data).

Demo

The demonstration of the HLA-HD execution is described in this pdf file.

Tips

Usage of multiple fastq files
HLA-HD can not adopt to multiple fastq, so merge them in advance.
>cat sample.1_1.fastq sample.1_2.fastq > sample_1.fastq
>cat sample.1_2.fastq sample.2_2.fastq > sample_2.fastq

Using bam files mapped to human genome
If you have mapped result to human genome, you can create fastq of mhc region and unmapped reads by using samtools and picard tools as follows:
#Extract MHC region
:for GRCh38.p12
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,510,120-33,480,577 > sample.mhc.bam
:for GRCh37
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,477,797-33,448,354 > sample.mhc.bam
#Extract unmap reads
>samtools view -b -f 4 sample.sorted.bam > sample.unmap.bam
#Merge bam files
>samtools merge -o sample.merge.bam sample.unmap.bam sample.mhc.bam
#Create fastq
>java -jar picard.jar SamToFastq I=sample.merge.bam F=sample.hlatmp.1.fastq F2=sample.hlatmp.2.fastq
#Change fastq ID
>cat sample.hlatmp.1.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/1″,” 1″,O);print O}else{print $0}}’ > sample.hla.1.fastq
>cat sample.hlatmp.2.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/2″,” 2″,O);print O}else{print $0}}’ > sample.hla.2.fastq

Filtering of reads (March 6, 2019)
For WES or WGS data, bowtie2 is rarely aborted because it requires vast computer resources. To avoid the problem, you can filter reads in advance as follows:
#Get full resolution (8-digit) hla sequence information
>wget ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla_gen.fasta
#Create bowtie2 index
>bowtie2-build hla_gen.fasta hla_gen
#Map fastq to hla sequence
>bowtie2 -x hla_gen -1 sample_1.fastq -2 sample_2.fastq -S sample.hlamap.sam
or
>bowtie2 -p number_of_cores -x hla_gen -1 sample_1.fastq -2 sample_2.fastq -S sample.hlamap.sam
#Extract mapped reads
>samtools view -h -F 4 sample.hlamap.sam > sample.mapped.sam
#Convert mapped sam to fastq
>java -jar picard.jar SamToFastq I=sample.mapped.sam F=sample.hlatmp.1.fastq F2=sample.hlatmp.2.fastq
#Change fastq ID
>cat sample.hlatmp.1.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/1″,” 1″,O);print O}else{print $0}}’ > sample.hla.1.fastq
>cat sample.hlatmp.2.fastq |awk ‘{if(NR%4 == 1){O=$0;gsub(“/2″,” 2″,O);print O}else{print $0}}’ > sample.hla.2.fastq
After the filtering, use sample.hla.1.fastq and sample.hla.2.fastq as new hlahd input.

Reference

Kawaguchi, S. and Matsuda, F. “High-Definition Genomic Analysis of HLA Genes Via Comprehensive HLA Allele Genotyping”, Methods Mol Biol., 2131:31-38, doi: 10.1007/978-1-0716-0389-5_3, 2020
・Scripts and data can be downloaded from here.

Kawaguchi, S. et al. “Comprehensive HLA Typing from a Current Allele Database Using Next-Generation Sequencing Data”, Methods Mol Biol., 1802:225-233, doi: 10.1007/978-1-4939-8546-3_16, 2018.

Kawaguchi, S. et al. “HLA-HD: An accurate HLA typing algorithm for next-generation sequencing data” Hum Mutat., Jul;38(7):788-797, doi: 10.1002/humu.23230, 2017.

Contact:

Shuji Kawaguchi: shuji@genome.med.kyoto-u.ac.jp